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(4)  Introduction 


Treatment  of  breast  cancer  at  an  early  stage  can  significantly  improve  the  survival  rate  of 
patients.  Mammography  is  currently  the  most  sensitive  method  for  detecting  early  breast  cancer  [1, 
2],  and  it  is  also  the  most  practical  for  screening.  Although  general  rules  for  the  differentiation 
between  malignant  and  benign  lesions  exist,  in  clinical  practice,  only  15-30%  of  cases  referred  for 
surgical  biopsy  are  actually  malignant.  A  number  of  research  groups  are  in  the  process  of  developing 
computer-aided  diagnosis  (CAD)  methods  which  can  provide  a  second  opinion  to  the  radiologist  for 
the  detection  and  classification  of  breast  abnormalities. 

Radiologists  routinely  use  several  mammograms  of  different  views  of  a  patient  with  those 
obtained  in  previous  years  for  identifying  interval  changes,  detecting  potential  abnormalities,  and  in 
evaluating  breast  lesions.  It  is  widely  accepted  that  interval  changes  in  mammographic  features  are 
very  useful  for  both  detection  and  classification  of  breast  abnormalities.  Some  existing  CAD 
techniques  use  information  from  multiple  views  of  the  same  breast.  Others  use  previous 
mammograms  for  detection.  However  none  incorporates  information  about  the  temporal 
mammographic  changes  in  the  breast  tissue  for  classification. 

The  goal  of  this  project  is  to  evaluate  the  usefulness  of  using  interval  changes  to  distinguish 
between  normal  structures,  benign  masses,  and  malignant  masses  in  CAD.  The  purpose  of  this  study 
is  summarized  as  follows:  1.  Characterize  temporal  changes  in  terms  of  the  mammographic  features 
of  normal  breast  structures,  as  well  as  benign  and  malignant  masses.  2.  Use  this  information  to 
develop  methods  for  CAD.  We  hypothesize  that  the  use  of  temporal  changes  in  mammographic 
features  between  current  and  previous  mammograms  of  the  patient  will  improve  the  success  of  CAD 
technique  for  classification  of  masses.  It  is  therefore  expected  that  the  use  of  such  temporal 
information  will  improve  the  positive  predictive  value  of  mammography  by  reducing  benign 
biopsies,  and  hence  reduce  both  cost  and  patient  morbidity. 

To  accomplish  this  goal  we  will  first  develop  and  evaluate  reliable  techniques  for  the 
temporal  regional  registration  of  mammograms  of  the  same  patient.  The  temporal  mammogram 
registration  technique  we  have  developed  is  a  novel  approach  in  which  the  computer  emulates  the 
search  method  used  by  many  radiologists  for  finding  corresponding  structures  on  mammograms.  The 
method  aims  at  registering  a  small  region  containing  a  suspected  mass  on  the  most  recent 
mammogram  of  the  patient  with  one  on  a  mammogram  obtained  from  a  previous  year.  Our  regional 
registration  technique  involves  three  steps:  (1)  identification  of  a  suspicious  structure  on  the  most 
recent  mammogram,  (2)  initial  estimation  of  the  location  on  a  previous  mammogram  of  the  region 
corresponding  to  the  suspicious  structure  and  the  definition  of  a  search  region  which  encloses  the 
object  of  interest  on  the  previous  mammogram,  and  (3)  accurate  identification  of  the  location  of  the 
matched  object  within  the  search  region.  The  characteristic  features  of  the  two  matched  lesions  then 
will  be  automatically  extracted  and  interval  changes  estimated.  This  interval  change  information  will 
be  incorporated  in  an  integrated  CAD  system. 
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(5)  Body 

In  the  forth  year  (4/6/01-4/5/02)  of  this  grant,  we  have  performed  the  following  studies: 

(A)  Database  collection  and  extraction  of  regions  of  interest  (Task  1) 

We  continued  collecting  the  data  set  for  this  study  from  the  files  of  patients  who  had 
undergone  biopsy  at  the  University  of  Michigan.  The  mammograms  are  scanned  and  the  images  are 
saved  in  our  storage  device  using  automated  graphic  user  interface  developed  in  our  laboratory. 
Additionally  the  film  information  is  recorded  in  a  Microsoft  Access  database.  Temporal  pairs  of 
images  were  obtained.  The  current  mammogram  of  each  temporal  pair  exhibited  a  biopsy-proven 
mass.  We  scan  both  cranio-caudal  and  mediolateral-oblique  views.  The  mammograms  were  digitized 
with  a  LUMISCAN  85  laser  scanner  at  a  pixel  resolution  of  0.05  mm  x  0.05  mm  and  with  12-bit 
resolution. 

While  the  regional  registration  technique  can  be  used  for  determining  a  corresponding 
structure  or  region  for  any  structure  (both  normal  tissues  and  masses)  in  the  breast,  in  this  study  we 
are  analyzing  its  accuracy  on  biopsy-proven  masses  alone.  The  location  of  the  mass  on  the  current 
mammogram  is  identified  by  an  Mammography  Quality  Standards  Act  (MQSA)-approved 
radiologist  experienced  in  breast  imaging  using  an  interactive  image  analysis  tool  on  a  UNIX 
workstation.  To  provide  the  ground  truth  for  evaluation  of  the  computerized  method,  the  radiologist 
manually  identifies  the  corresponding  region  on  the  previous  mammogram.  Bounding  boxes 
enclosing  the  mass  on  the  current  mammogram  and  the  corresponding  object  on  the  previous 
mammogram  are  provided  by  the  radiologist  for  each  case.  Each  mass  as  well  as  the  corresponding 
structure  on  the  previous  mammogram  are  rated  for  its  visibility  on  a  scale  of  1  to  10,  where  the 
rating  of  1  corresponded  to  the  most  visible  category.  The  size  of  the  mass  on  the  current 
mammogram  as  well  as  the  size  of  the  corresponding  structure  on  the  previous  mammogram  are  also 
measured  by  the  radiologist.  The  parenchymal  density  is  rated  based  on  the  Breast  Imaging 
Reporting  and  Data  System  (BI-RADS)  lexicon. 

(B)  Further  developments  of  methods  for  establishing  corresponding  locations  in  current 
and  previous  mammograms  (Task  3) 

We  continued  to  improve  our  regional  registration  technique  [3-6].  In  the  first  step,  we  are 
still  working  on  development  of  an  automated  method  that  will  detect  the  nipple  location  in  the 
breast  image.  The  method  is  based  on  both  the  change  of  tangential  direction  and  the  change  in  the 
tissue  density  along  the  breast  border. 

In  the  third  and  final  step  we  have  designed  a  new  method  -  an  adaptive  similarity  measure 
(ASM),  to  improve  automated  identification  of  corresponding  lesions  on  prior  mammograms. 

We  are  developing  a  new  class  of  similarity  measures  (SM).  It  combines  adaptive  filtering 
to  enhance  the  lesion  and  a  SM  as  a  figure-of-merit  (FOM)  measure.  The  filters  are  designed  with  a 
training  set  to  maximize  and  minimize  the  FOM  for  the  similar  and  dissimilar  image  pairs, 
respectively,  by  using  a  gradient  optimization  technique.  The  ASM  is  applied  to  the  final  stage  of 
our  multistage  regional  registration  technique  for  mass  identification  on  the  prior  mammogram.  A 
search  for  the  best  match  between  the  lesion  template  from  the  current  mammogram  and  a 
structure  on  the  prior  mammogram  is  carried  out  within  a  search  region,  guided  by  the  ASM. 

This  new  approach  was  evaluated  by  using  179  temporal  pairs  of  mammograms  containing 
biopsy-proven  masses. 

We  found  that  86%  of  the  estimated  lesion  locations  resulted  in  an  area  overlap  of  at  least 
50%  with  the  true  lesion  locations.  The  average  distance  between  the  estimated  and  the  true  lesion 
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centroids  on  the  prior  mammogram  was  4.5  ±  6.7  mm.  In  comparison,  the  correct  localization  and 
the  average  distance  using  a  conventional  correlation  SM  were  84%  and  4.9  ±  7.0  mm,  respectively. 

The  preliminary  results  of  this  study  are  promising.  The  ASM  improved  the  identification 
of  the  corresponding  lesions  on  temporal  pairs  of  mammograms.  The  average  Euclidean  distance 
between  the  computer  estimate  of  the  corresponding  structure  and  the  radiologist-identified 
location  and  their  standard  deviation  were  both  reduced  when  compared  to  multistage  registration 
using  a  conventional  correlation  SM. 

We  will  present  the  preliminary  results  on  this  improved  method  at  the  International 
Workshop  for  Digital  Mammography  (IWDM),  Bremen,  Germany,  June  22-25, 2002  [22]. 

In  the  previous  years,  when  we  increased  the  data  set  from  124  to  179  temporal  pairs  the 
detection  accuracy  was  slightly  reduced.  For  124  temporal  pairs  87%  of  the  estimated  lesion 
locations  resulted  in  an  area  overlap  of  at  least  50%  with  the  true  lesion  locations  [5],  [6].  For  179 
pairs  it  was  reduced  to  84  %.  The  average  distance  between  the  estimated  and  the  true  lesion 
centroids  on  the  prior  mammogram  for  124  temporal  pairs  was  4.2+ 5.7  mm.  For  179  temporal 
pairs  the  average  distance  was  4.9+ 7.0  mm.  The  main  reason  for  the  reduction  of  the  detection 
accuracy  was  due  to  the  more  difficult  55  new  temporal  pairs.  They  included  subtle  masses 
surrounded  by  breast  densities  with  brighter  mammography  appearance.  Those  structures  were 
making  the  detection  more  difficult.  A  way  to  overcome  this  problem  was  to  continue  to  improve 
the  detection  methods.  We  introduced  density-weighted  contrast  enhancement  (DWCE)  technique 
[7]  to  improve  the  localization  of  the  corresponding  mass  on  the  prior  mammogram.  The  average 
distance  between  the  estimated  and  the  true  centroid  of  the  lesions  on  the  prior  mammogram  was 
4.8  +  6.9  mm,  which  was  slightly  improved,  however  84%  of  the  estimated  lesion  locations  resulted 
in  an  area  overlap  of  at  least  50%  with  the  true  lesion  locations  and  it  did  not  show  improvement. 
By  using  the  ASM  method  (described  above)  both  the  correct  localization  and  the  average  distance 
were  86%  and  4.5  +  6.7  mm  respectively,  which  is  improved  result  compared  to  conventional  and 
DWCE  method. 

We  will  continue  our  studies  to  improve  the  technique,  expand  it  to  different  types  of  SM, 
and  evaluate  its  accuracy  on  a  larger  data  set. 

(C)  Obtaining  hand  drawn  mass  boundaries  from  radiologists  and  evaluation  of 
segmentation  accuracy  (Task  9) 

As  we  reported  before  we  carried  out  an  evaluation  of  the  segmentation  technique  by 
comparison  of  the  computer  segmentations  (K-means  clustering  [8]  and  active  contour  [9-11])  with 
hand  segmentations  using  the  expertise  of  the  radiologist. 

Obtaining  hand  drawn  mass  boundaries  from  radiolo2ists 

An  MQSA-approved  radiologist  experienced  in  breast  imaging  outlined  the  mass  boundaries  of 
the  masses  on  239  regions  of  interest  (ROI)s  using  an  interactive  image  analysis  tool  on  a  UNIX 
workstation. 

In  the  future  year  more  MQSA-approved  radiologists  experienced  in  breast  imaging  will 
hand  segment  mass  boundaries  of  the  masses  on  the  ROIs. 

Formulate  quantitative  measures  for  assessing  segmentation  accuracy 


For  the  purpose  of  our  accuracy  analysis,  the  radiologist’s  hand  segmentations  were  used  to 
compare  with  the  computer  segmentations.  Three  quantitative  measures  were  used  for  evaluation 
of  the  accuracy  of  the  computer  segmentations:  Hausdorff  distance,  average  Hausdorff  distance  and 
the  area  overlap  measure. 

Hausdorff  distance  between  two  curves  is  defined  as  the  maximum  of  the  closest  point  distances 
(DCPs)  between  the  two  curves  [12],  [13].  The  closest  point  distance  (DCP)  associates  each  point 
on  both  curves  to  a  point  on  the  other  curve,  and  the  Hausdorff  distance  finds  the  largest  distance 
between  the  associated  points.  The  average  Hausdorff  distance,  on  the  other  hand,  finds  the 
average  of  the  DCPs  between  the  two  curves. 

Area  overlap  is  defined  as  follows: 


APIA, 

AUA^ 


100, 


where  Ai  is  area  inside  the  hand  segmented  mass  outline  and  A2  is  area  inside  the  computer 
segmented  mass  outline. 


Evaluate  quantitatively  the  accuracy  of  computer  boundary  segmentation  using  radiologists 
hand  segmentation 

Here  we  summarize  again  the  segmentation  results.  For  this  study  we  used  239  ROI.  All 
results  presented  in  the  following  are  average  results  for  this  239  ROIs. 

The  results  of  the  first  stage  of  segmentation  by  the  K-means  clustering  algorithm  are  the 
following:  average  area  overlap  of  40%,  average  Hausdorff  distance  of  5.58  mm,  and  average 
Hausdorff  distance  of  2.19  mm  averaged  over  239  ROIs. 

The  results  of  the  active  contour  segmentation  are  the  following:  average  Area  overlap  of 
67%,  average  Hausdorff  distance  of  4.49  mm,  and  average  Hausdorff  distance  of  1.27  mm  averaged 
over  239  ROIs. 

The  active  contour  segmentation  improved  the  segmentation  accuracy.  The  above  results 
confirm  the  visual  satisfactory  agreement  between  the  active  contour  segmentation  and 
radiologist’s  hand  segmentation. 


Evaluate  qualitatively  the  computer  segmentation  by  means  of  an  observer  study 

Our  experience  showed  that  radiologists  gave  very  subjective  estimation  with  large 
variation  within  the  data  set  when  they  visually  evaluated  the  computer  segmentation  We  found 
that  radioligist’s  hand  segmentation  and  a  quantitative  comparison,  described  above,  is  a  better  way 
to  evaluate  the  accuracy  of  the  computer  segmentation.  This  quantitation  study  would  replace  the 
visual  qualitative  evaluation  proposed  in  our  original  research  plan. 


(D)  Further  develop  methods  for  extracting  morphological  and  texture  features  from 
masses  segmented  from  ROIs  extracted  from  current  and  prior  mammograms  (Task 
10) 

As  we  reported  previously,  we  develop  methods  for  extraction  of  texture,  morphological 
and  spiculation  features  from  the  segmented  masses  based  on  active  contour  segmentation  [9-11]. 
Since  the  feature  extraction  is  very  important  for  the  classification  and  we  used  the  methods 
intensively  during  the  past  year,  here  we  will  summarize  the  methods. 
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The  texture  features  used  in  this  study  were  calculated  from  run-length  statistics  (RLS) 
matrices  [14].  The  RLS  matrices  were  computed  from  the  images  obtained  by  the  rubber  band 
straightening  transform  (RBST)[15].  The  REST  maps  a  band  of  pixels  surrounding  the  mass  onto 
the  Cartesian  plane  (a  rectangular  region).  In  the  transformed  image,  the  mass  border  appears 
approximately  as  a  horizontal  edge,  and  spiculations  appear  approximately  as  vertical  lines. 

RLS  texture  features  were  extracted  from  the  vertical  and  horizontal  gradient  magnitude 
images,  which  were  obtained  by  filtering  the  REST  image  with  horizontally  or  vertically  oriented 
Sobel  filters  and  computing  the  absolute  gradient  values  of  the  filtered  image.  Five  texture 
measures,  namely,  short  run  emphasis  (SRE),  long  run  emphasis  (LRE),  gray  level  nonuniformity 
(GLN),  run  length  nonuniformity  (RLN),  and  run  percentage  (RP)  were  extracted  from  the  vertical 
and  horizontal  gradient  images  in  two  directions,  0  =  0°,  and  d  =90°.  Therefore,  a  total  of  20 
RLS  features  were  calculated  for  each  ROI.  The  definition  of  the  RLS  feature  measures  can  be 
found  in  the  literature  [14]. 

The  morphological  features  were  extracted  from  the  automatically  segmented  mass  shape. 
Five  of  morphological  features  were  based  on  the  normalized  radial  length  (NRL),  defined  as  the 
Euclidean  distance  from  the  object’s  centroid  to  each  of  its  edge  pixels,  i.e.,  the  radial  length,  and 
normalized  relative  to  the  maximum  radial  length  for  the  object  [16].  The  following  five  NRL 
features  were  extracted:  mean  (NRLAVG),  standard  deviation  (NRLSD),  entropy  (NRLENT),  area 
ratio  (NRLAREAR),  zero  crossing  count  (NRLZCC).  In  addition,  the  perimeter  (PERM),  area 
(AREA),  circularity  (CIRC),  rectangularity  (SQR),  contrast  (CONT),  perimeter-to-area  ratio  (CRR) 
and  Fourier  descriptor  (FF)  features  were  extracted.  The  detailed  definition  of  the  morphological 
features  can  be  found  in  [17],  [18]. 

A  spiculation  measure  was  defined  for  each  pixel  on  the  mass  border  by  using  the  statistics 
of  the  image  gradient  direction  relative  to  the  normal  direction  to  the  mass  border  in  a  ring  of 
pixels  surrounding  the  mass  [17],  [19].  The  spiculation  measure  for  each  border  pixel  was 
normalized  to  be  between  0  and  nil,  with  a  value  of  nJA  indicating  a  random  orientation  of  image 
gradients,  and  larger  values  indicating  a  higher  likelihood  of  spiculation.  Three  features  were 
extracted  from  the  spiculation  measure.  The  first  feature  (AVG)  was  the  average  of  the  spiculation 
measure  for  all  pixels  on  the  mass  boundary.  The  second  feature  (PERC_AEV)  was  the 
percentage  of  border  pixels  with  a  spiculation  measure  larger  than  nJA,  and  the  third  feature 
(AVE_AEV)  was  the  average  of  the  spiculation  measure  for  those  pixels  with  a  spiculation 
measure  larger  than  nJA. 

A  total  of  35  features  (20  RLS,  12  morphological  and  3  spiculation)  were  therefore 
extracted  from  each  ROI. 

More  detailed  description  of  the  above  can  be  found  in  [23]. 

(E)  Analyze  techniques  for  characterizing  differences  in  these  features  (Task  11) 

Additionally,  difference  features  were  obtained  by  subtracting  a  prior  feature  from  the 
corresponding  current  feature.  Therefore,  35  difference  features  were  derived  from  the  20  RLS,  12 
morphological  and  3  spiculation  features. 

We  designed  a  new  classification  scheme  allowing  direct  merge  of  current  and  prior 
information.  The  input  feature  space  to  the  classifier  includes  the  current,  prior  and  difference 
features.  This  allows  the  classifier  to  choose  the  individual  current  and  prior  features  or  the 
difference  features.  Stepwise  feature  selection  with  simplex  optimization  is  used  to  select  the 
optimal  feature  subset.  A  linear  discriminant  classifier  (LDA)  is  used  to  merge  the  selected 
features  for  classification  of  malignant  and  benign  masses.  A  leave-one-case-out  training  and 
testing  resampling  scheme  is  used  for  feature  selection  and  classification. 
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We  have  published  the  results  on  this  method  in  Medical  Physics  Journal  [23]  as  well  as  we 
presented  the  results  at  the  RSNA  meeting  [20]  and  SPIE  meeting  [21]. 

(F)  Evaluate  the  effectiveness  of  LDA  classifiers  and  neural  networks  for  classification 

(Task  12) 

At  this  stage  of  our  study  we  used  57  biopsy  proven  masses  (33  malignant  and  24  benign)  in 
the  56  cases.  The  241  mammograms  contained  different  mammographic  views  (CC,  MLO,  and 
lateral  views)  and  multiple  examinations  of  the  masses  including  the  examination  when  the  biopsy 
decision  was  made.  By  matching  masses  of  the  same  view  from  two  different  examinations,  a  total 
of  140  temporal  pairs  were  formed,  of  which  85  were  malignant  and  55  benign.  A  malignant 
temporal  pair  consisted  of  a  biopsy  proven  malignant  mass  or  a  mass  that  was  initially  not 
recommended  for  biopsy  and  later  found  to  be  malignant  by  biopsy  in  a  future  year.  A  similar 
definition  was  used  for  the  benign  temporal  pairs.  Within  a  pair,  the  current  mammogram  was 
defined  as  the  mammogram  with  the  later  date,  and  the  prior  mammogram  was  defined  as  the  one 
with  the  earlier  date.  Therefore,  in  cases  with  three  consecutive  exams,  more  than  one  temporal 
pair  could  be  formed  and  two  of  the  mammograms  could  be  called  “current”.  Among  the  140 
temporal  pairs,  we  had  120  unique  current  mammograms.  Of  the  masses  in  the  120  current 
mammograms,  70  were  malignant  and  50  benign. 

The  current,  prior,  and  difference  features  formed  a  multidimensional  feature  space  for  the 
classification  task.  Stepwise  feature  selection  applied  to  linear  discriminant  analysis  (LDA)  was 
used  to  select  the  most  useful  features.  The  selected  features  were  then  used  as  the  input  predictor 
variables  for  the  LDA  classifier.  The  classifier  was  trained  and  tested  by  a  leave-one-case-out 
resampling  scheme.  A  case  was  considered  to  contain  all  ROIs  from  a  given  patient.  In  each 
resampling  step,  the  temporal  pairs  from  55  cases  were  used  for  feature  selection  and  formulation 
of  the  linear  discriminant  function,  while  the  temporal  pairs  from  the  left-out  case  were  used  for 
testing  the  trained  classifier.  A  total  of  56  training  and  testing  steps  were  obtained  from  the  56 
cases.  The  classification  results  from  the  56  test  cases  were  accumulated  to  evaluate  the  classifier 
performance.  Since  the  data  set  in  this  study  was  still  small,  we  chose  the  feature  selection 
parameters  such  that  the  dimensionality  of  the  input  feature  vector  for  the  LDA  classifier  was  small 
in  order  to  reduce  the  possibility  of  over-training. 

To  evaluate  the  improvement  in  the  classifier  performance  designed  by  using  the  temporal 
change  information,  two  additional  classifiers  with  different  input  features  were  obtained.  One  of 
them  was  trained  using  the  information  extracted  from  the  current  single  images  of  the  temporal 
pairs.  The  other  classifier  was  trained  using  the  information  extracted  from  the  prior  single 
images  of  the  temporal  pairs.  Comparison  of  the  three  classifiers  will  reveal  the  effectiveness  of 
interval  change  analysis  for  the  classification  of  malignant  and  benign  masses. 

Li  this  specific  study  we  decided  to  use  LDA  classifier  in  order  to  have  linear  combination 
among  the  features.  The  neural  network  classifier  (NNC)  combines  the  input  features  in  nonlinear 
way,  which  will  make  the  analysis  and  comparisons  more  complicated.  The  use  of  NNC  will 
involve  uncertainties  for  the  structure  of  the  neural  network  (number  of  hidden  layers,  number  of 
neurons  in  the  hidden  layers),  number  of  iteration  for  training  the  NNC  and  how  to  have  NNC  not 
overtrained.  One  of  initial  aims  of  the  recent  study  was  to  find  the  useful  subset  of  features.  For 
this  purpose  we  extensively  used  the  feature  selection,  which  in  case  of  NNC  is  very 
computationally  intensive. 

At  this  stage  of  our  research  we  wanted  to  have  simple  classification  method  which  allowed 
us  to  find  useful  features,  and  to  design  and  compare  different  classifiers  with  different  input 
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feature  spaces  (classifiers  based  on  current  and  prior  images)  efficiently.  This  is  the  reason  to 
concentrate  mainly  on  the  use  of  the  LDA  classification  schemes. 


(G)  Evaluate  the  effectiveness  of  developed  classifiers  using  receiver  operating 

characteristic  methodology  (Task  13) 

To  evaluate  the  classifier  performance,  the  training  and  test  discriminant  scores  were 
analyzed  using  receiver  operating  characteristic  (ROC)  methodology  [26].  The  discriminant  scores 
of  the  malignant  and  benign  masses  were  used  as  decision  variables  in  the  LABROCl  program 
[27],  which  fits  a  binormal  ROC  curve  based  on  maximum  likelihood  estimation.  The 
classification  accuracy  was  evaluated  as  the  area  under  the  ROC  curve,  Az.  The  performances  of 
the  classifiers  were  also  assessed  by  estimating  the  partial  area  index  (Az^*^'^^).  The  partial  area 
index  (Az^°  ^^)  is  defined  as  the  area  that  lies  under  the  ROC  curve  but  above  a  sensitivity  threshold 
of  0.9  (TPFo  =  0.9)  normalized  to  the  total  area  above  TPFq,  (1-TPFo).  The  partial  Az^”'^^  indicates 
the  performance  of  the  classifier  in  the  high  sensitivity  (low  false  negative)  region  which  is  most 
important  for  a  cancer  detection  task. 

The  performances  of  the  classifiers  based  on  the  temporal  pairs,  the  current  images,  and  the 
prior  images  are  summarized  in  Table  1.  The  classifiers  that  achieved  the  highest  test  Az  values 
with  a  small  average  number  of  features  were  presented  here.  Table  2  is  a  summary  of  the  features 
selected  for  each  classifier. 

For  the  56  training  subsets  of  temporal  pairs  used  in  this  study,  an  average  of  10  features 
were  selected  for  the  classification  task.  The  most  frequently  selected  features  included  4 
difference  RLS  features  (3  SRE  and  1  LRE),  4  RLS  features  (2  SRE,  1  RLN  and  1  RP),  1 
spiculation  feature  from  the  current  image,  and  1  spiculation  feature  from  the  prior  image  (Table 
2).  The  LDA  classifier  achieved  an  average  training  Az  of  0.92  and  a  test  Az  of  0.88.  The  test 
partial  Az^®  was  0.37. 

Table  1.  Classification  results  for  the  classifier  based  on  the  temporal  change  information,  the 
classifier  based  on  current  single  image  information,  and  the  classifier  based  on  prior 
single  image  information. 


Classification 

Avg.  No.  of  selected 
features 

Training  A^ 

Test  A^ 

Test  partial 

Ar 

Temporal  pairs 

10 

0.92 

0.88  ±  0.03 

0.37  ±0.10 

Current  images 

11 

0.90 

0.82  ±  0.04 

0.32  ±  0.08 

Prior  images 

4 

0.78 

0.76  ±  0.04 

0.24  ±  0.08 

(H)  Identify  the  preferred  features  and  classiHcation  methods  (Task  14) 

Texture  and  spiculation  features  were  important  for  malignant  and  benign  classification  of 
mammographic  masses  for  all  three  types  of  classifiers:  the  classifier  based  on  temporal  pair 
information,  the  classifier  based  on  current  image  information,  and  the  classifier  based  on  prior 
image  information  [23].  One  or  more  of  the  spiculation  features  were  always  selected  in  all 
training  partitions  for  all  three  classifiers.  The  most  frequently  selected  texture  features  were  the 
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short  run  emphasis  (SRE)  features.  They  comprised  more  than  50  %  of  the  texture  features 
selected  for  the  three  classifiers  (Table  2). 

Temporal-information-based  classifier  showed  improved  performance  compared  to  the 
classifiers  based  on  current  or  prior  image  information  alone.  The  input  feature  space  to  the 
temporal-information-based  classifiers  included  the  current,  prior,  and  difference  features.  This 
allows  the  classifier  to  choose  the  individual  features  or  the  difference  features.  Using  the  stepwise 
feature  selection  procedure  and  the  linear  discriminant  classifier,  it  was  found  that  the  texture  and 
the  spiculation  features  contained  useful  temporal  information  to  perform  malignant  and  benign 
mass  classification.  Texture  features  appeared  to  provide  the  best  information  by  the  difference 
features  obtained  from  subtracting  the  prior  from  the  corresponding  current  features  (SRE  and  LRE 
difference  features).  On  the  other  hand,  the  best  use  of  the  spiculation  features  appeared  to  be  a 
direct  combination  of  current  and  prior  features  in  the  input  feature  vector  by  the  EDA  since  the 
individual  features  were  chosen. 

We  found  that  better  feature  subsets  could  be  selected  by  the  stepwise  feature  selection  in  the 
subspaces  than  in  the  entire  feature  space.  For  example,  for  the  temporal-information-based 
classifier,  a  better  feature  subset  with  a  higher  test  Az  at  0.88  was  found  when  the  input  feature 
space  included  only  the  texture  and  spiculation  subspaces.  The  addition  of  the  morphological 
feature  subspace  to  the  input  feature  space  reduced  the  highest  test  Az  to  0.84.  Similarly,  in  the 
case  of  the  classifier  based  on  prior  image  information,  a  better  feature  subset  was  obtained  when 
the  texture  and  spiculation  feature  subspaces  were  used  in  the  input  feature  space  for  stepwise 
feature  selection.  Again  the  addition  of  the  morphological  feature  subspace  to  the  input  feature 
space  reduced  the  highest  test  Az  to  0.72.  The  classifier  based  on  current  image  information  was 
the  only  one,  among  the  three,  that  obtained  a  better  result,  as  shown  in  Table  1,  when  the 
morphological  feature  subspace  was  included  in  the  input  feature  space. 

One  reason  for  the  poor  performance  of  the  morphological  features  may  be  due  to  the  fact 
that  the  masses  were  more  subtle  in  the  prior  images.  In  fact,  the  experienced  MQSA 
mammographer  was  not  confident  in  seeing  25  of  the  "masses"  on  the  prior  images  and  could  not 
provide  a  mass  size  estimation  for  them.  Although  the  active  contour  model  would  stop  the 
iteration  based  on  the  preset  criteria  and  found  an  “outline”  of  the  masses  on  the  prior 
mammograms,  generally  these  mass  outlines  were  less  reliable  than  those  on  the  current  masses  in 
providing  morphological  characteristics  of  the  masses.  Texture  features  did  not  depend  as  strongly 
on  the  precise  mass  boundary  as  morphological  features.  Three  out  of  the  four  features  selected  for 
classification  of  the  malignant  and  benign  masses  on  the  prior  images  were  RLS  texture  features. 
A  spiculation  feature  was  also  found  to  be  a  good  discriminator. 

In  this  study,  we  employed  a  simple  measure  of  temporal  change  by  taking  the  difference 
between  the  feature  from  the  current  mass  and  the  corresponding  feature  from  the  prior  mass.  We 
observed  improvement  in  classification  with  this  simple  temporal  information.  It  will  be  important 
to  evaluate  other  similarity  measures  that  can  characterize  small  difference  in  image  features  of  the 
object  of  interest.  It  can  be  expected  that  a  more  sensitive  similarity  measure  will  provide  a  better 
measurement  of  dissimilarity,  or  difference,  between  the  current  and  prior  masses  and  further 
improve  the  utilization  of  the  temporal  change  information  on  mammograms. 

(I)  Compare  the  accuracy  of  computerized  classiHcation  with  the  malignancy  assessment  of 

radiologists  (Task  15). 

We  performed  ROC  analysis  of  the  malignancy  confidence  ratings  provided  by  the 
experienced  MQSA  radiologist  for  the  current  image  data  set  (120  images)  [23].  The  radiologist 
estimated  the  likelihood  of  malignancy  of  the  current  masses  on  a  10-point  confidence  scale  (1  - 
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definitely  benign  and  10  -  definitely  malignant)  based  on  the  120  current  mammograms  alone 
without  comparison  with  the  prior. 

The  malignancy  ratings  resulted  in  an  Az  value  of  0.80±0.04.  This  indicates  that  the  masses 
in  the  current  mammograms  cannot  be  easily  distinguished  as  malignant  or  benign  even  by  an 
experienced  radiologist,  consistent  with  the  fact  that  all  lesions  had  indeed  undergone  biopsy.  The 
classifier  based  on  the  current  image  information  has  an  Az  value  of  0.82±0.04,  similar  to  the 
accuracy  of  the  radiologist  for  this  data  set. 


Table  2.  Selected  features  for  classifiers  based  on  temporal  pairs,  current  images,  and  prior  images. 
The  letter  “H”  or  “V”  at  the  beginning  of  the  texture  feature  labels  indicates  that  the 
features  were  extracted  from  the  horizontal  or  vertical  gradient  magnitude  images, 
respectively.  The  number  (0  or  90)  at  the  end  of  the  texture  feature  labels  shows  the 
direction  at  which  the  features  were  extracted. 


Feature  type 

Group 

Features 

Temporal  pairs 

Current 

images 

Prior 

images 

Curr 

Pr 

Diff 

Curr 

Pr 

Texture 

SRE 

X 

X 

X 

X 

X 

X 

X 

X 

LRE 

X 

X 

X 

X 

X 

ESSSI^H 

X 

X 

Spiculation 

X 

X 

AVG 

X 

X 

Morphological 

CRR 

X 

NRLZCC 

X 

PERIM 

X 

NRLAVG 

X 

SOR 

X 

CONT 

X 

(J)  Evaluate  usefulness  of  temporal  features  for  CAD  by  comparison  of  classification  based 
on  temporal  features  with  classiflcation  based  on  features  extracted  from  the  current 
mammogram  alone  (Task  15) 

For  classification  of  malignant  and  benign  masses  using  the  current  single  images  (the  120 
current  images  of  the  temporal  pairs),  the  LDA  classifier  selected  an  average  of  1 1  features  for  the 
56  training  subsets  [23].  The  most  frequently  selected  features  were  4  RLS  features  (2  SRE,  1 
LRE  and  1  RLN),  1  spiculation  feature,  and  6  morphological  features  (Table  2).  The  classifier 
achieved  an  average  training  Az  of  0.90,  a  test  Az  of  0.82,  and  a  test  partial  Az^°  of  0.32. 
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For  the  classification  of  masses  based  on  the  prior  single  images  alone,  an  average  of  4 
features  were  selected  for  the  56  training  subsets.  The  most  frequently  selected  features  were  3 
RLS  features  (1  SRE,  1  LRE,  and  1  RP)  and  1  spiculation  feature.  The  LDA  classifier  achieved  an 
average  training  Az  of  0.78,  test  Az  of  0.76,  and  test  partial  Az^°  ^^  of  0.24. 

The  difference  in  the  test  Az  between  the  classifier  based  on  the  temporal  pairs  and  that  based 
on  the  current  images  alone  is  statistically  significant  (p=0.015).  The  difference  in  the  test  Az 
between  the  classifier  based  on  the  temporal  pairs  and  that  based  on  the  prior  images  alone  is  also 
statistically  significant  (p=0.001).  The  partial  area  index  for  the  classifier  based  on  the  temporal 
pairs  is  also  improved  compared  to  the  classifiers  based  on  the  current  or  the  prior  images  alone, 
although  the  differences  did  not  achieve  statistical  significance. 

(K)  Perform  pilot  ROC  study  for  the  design  of  a  full-scale  ROC  experiment  (Task  15  -  final 

step  of  the  project). 

We  have  performed  a  pilot  study  as  a  first  step  to  design  an  observer  performance  experiment 
with  ROC  methodology  to  evaluate  the  effects  of  computer  classification  on  radiologists’ 
estimates  of  the  likelihood  of  malignancy  of  masses.  A  graphical  user  interface  was  developed 
on  a  PC  to  display  side-by-side  the  temporal  pairs  of  masses  in  a  predesigned  random  order  for 
each  observer.  The  likelihood  of  malignancy  and  the  BI-RADS  assessment  of  the  radiologist  on 
each  pair  is  automatically  recorded  when  they  select  it  on  a  slider. 

253  temporal  image  pairs  (136  malignant  and  117  benign)  from  95  patients  containing 
masses  on  serial  mammograms  were  chosen  from  patient  files  and  digitized.  Additional  pairs 
containing  normal  structures  were  also  included  to  simulate  a  more  realistic  clinical  situation.  The 
true  mass  locations  were  identified  by  an  experienced  radiologist  on  all  mammograms.  Regions  of 
interest  containing  the  corresponding  masses  were  then  extracted  from  the  current  and  prior 
mammograms  of  each  temporal  pair  and  analyzed  by  the  CAD  program.  All  cases  eventually 
underwent  biopsy  so  that  interval  change  was  observed  for  most  of  the  masses  even  if  they  were 
found  to  be  benign  after  biopsy.  This  was  therefore  a  difficult  data  set  for  interval  change  analysis. 

Two  radiologists  assessed  the  temporal  pairs  that  were  displayed  on  the  display  PC 
workstation.  They  provided  estimates  of  the  likelihood  of  malignancy  and  BI-RADS  assessment 
without  and  then  with  CAD.  The  reading  order  of  the  temporal  pairs  was  randomized  for  each 
observer.  The  classification  accuracy  was  quantified  by  using  the  area  under  ROC  curve,  A^. 

For  this  data  set,  the  computer  classifier  achieved  a  test  Az  value  of  0.86.  The  radiologists’ 
Az  values  for  the  likelihood  of  malignancy  were  0.72  and  0.74  without  CAD,  and  improved  to  0.76 
and  0.75,  respectively,  with  CAD.  The  improvement  was  statistically  significant  (p=0.0006)  for  the 
first  radiologist.  For  the  BI-RADS  assessments,  the  two  radiologists  obtained  Az  values  of  0.67 
and  0.77  without  CAD  and  improved  to  0.73  and  0.79,  respectively,  with  CAD.  The 
improvements  were  also  statistically  significant  (p<0.001). 

This  pilot  study  indicates  that  CAD  using  interval  change  analysis  may  be  useful  for  assisting 
radiologists  in  classification  of  masses  and  thereby  reducing  unnecessary  biopsies. 

This  pilot  study  will  be  the  basis  for  our  design  of  a  full-scale  ROC  study.  We  have  already 
recruited  6  radiologists  to  participate  as  observers.  The  results,  described  above,  show  that  the 
study  design  will  likely  produce  statistically  significant  results.  The  sample  size  is  acceptable  but 
we  are  continuing  to  enlarge  the  data  set  until  the  ROC  study  design  is  finalized.  We  expect  that 
this  ROC  study  can  be  completed  within  the  no  cost  time  extension  year  that  we  requested.  This 
type  of  observer  study  is  new  and  unique  and  the  outcome  will  be  important,  providing  a  new 
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understanding  of  the  potentials  of  computer  aid  to  the  radiologists  in  characterization  of  the 
temporal  changes  of  mammographic  masses. 
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(L)  Extension  of  the  developed  methods  to  the  microcalciflcations  classiHcation. 

We  were  encouraged  by  the  above  results  and  we  were  able  to  start  the  transfer  of  the  developed 
methods  for  detection  and  classification  of  temporal  masses  to  detection  and  classification  of 
temporal  microcalcifciation  clusters.  We  carried  out  a  preliminary  study,  which  showed  promising 
results  and  we  applied  for  an  BCRP-01,  IDEA  grant  at  the  U.S.  Army  Medical  Research  and 
Materiel  Conunand.  The  research  grant  was  approved  and  we  are  very  enthusiastic  and  encouraged 
that  we  will  have  the  opportunity  to  extend  the  already  developed  methods  and  design  new 
methods  for  detection,  classification  and  analysis  of  temporal  microcalcifciation  clusters. 

Some  preliminary  results  were  presented  at  RSNA  2001  [24],  and  SPIE  2002  [25]. 

(6)  Key  research  accomplishments  in  current  year  as  a  result  of  this  grant 

•  Database  collection  and  extraction  of  regions  of  interest  (Task  1). 

•  Further  development  of  methods  for  establishing  corresponding  locations  in  current  and 
previous  mammograms  (Task  3). 

•  Obtaining  hand  drawn  mass  boundaries  from  radiologists  and  evaluation  of  segmentation 
accuracy  (Task  9). 

•  Further  develop  methods  for  extracting  morphological  and  texture  features  from  masses 
segmented  from  ROIs  extracted  from  current  and  prior  mammograms  (Task  10). 

•  Analyze  techniques  for  characterizing  differences  in  these  features  (Task  1 1). 

•  Evaluate  the  effectiveness  of  LDA  classifiers  and  neural  networks  for  classification  (Task 

12). 

•  Evaluate  the  effectiveness  of  developed  classifiers  using  receiver  operating  characteristic 
methodology  (Task  13) 

•  Identify  the  preferred  features  and  classification  methods  (Task  14) 

•  Compare  the  accuracy  of  computerized  classification  with  the  malignancy  assessment  of 
radiologists  (Task  15). 

•  Evaluate  usefulness  of  temporal  features  for  CAD  by  comparison  of  classification  based  on 
temporal  features  with  classification  based  on  features  extracted  from  the  current 
mammogram  alone  (Task  15) 

•  Perform  pilot  ROC  study  for  the  design  of  a  full-scale  ROC  experiment  (Task  15-  final  step 
of  the  project). 

•  Extension  of  the  developed  methods  to  the  microcalcifications  classification. 


(7)  Reportable  Outcomes 

Publications  in  current  year  as  a  result  of  this  grant 

[1]  L.  Hadjiiski,  H.P.  Chan,  B.  Sahiner,  N.  Petrick,  M.  Helvie,  “Automated  Registration  of 
Breast  Lesions  in  Temporal  Pairs  of  Mammograms  for  Interval  Change  Analysis  -  Local 
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Affine  Transformation  for  Improved  Lx)calization”,  Medical  Physics,  28  (6),  June  2001, 
pp.  1070-1079. 

[2]  L.  Hadjiiski,  B.  Sahiner,  H.P.  Chan,  N.  Petrick,  M.  Helvie,  M.  Gurcan,  “Analysis  of 
temporal  changes  of  mammographic  features:  Computer-aided  classification  of  malignant 
and  benign  breast  masses”.  Medical  Physics,  28  (11),  November  2001,  pp.  2309-2317. 

[3]  L.  Hadjiiski,  B.  Sahiner,  H.P.  Chan,  N.  Petrick,  M.A.  Helvie,  M.  Gurcan,  “Analysis  of 
temporal  change  of  mammographic  features  for  computer-aided  characterization  of 
malignant  and  benign  masses  ”,  Oral  Presentation  at  SPIE  International  Symposium  on 
Medical  Imaging,  San  Diego,  California,  February  19-22,  2001,  Proc.  SPIE  Medical 
Imaging,  2001,  4322,  pp. 661 -666. 

[4]  L.  Hadjiiski,  B.  Sahiner,  H.P.  Chan,  N.  Petrick,  M.A.  Helvie,  “An  Adaptive  Similarity 
Measure  for  Automated  Identification  of  Breast  Lesions  in  Temporal  Pairs  of  Mammograms 
for  Interval  Change  Analysis”,  To  be  presented  at  the  6*  International  Workshop  for  Digital 
Mammography  (IWDM),  Bremen,  Germany,  June  22  -  25,  2002,  To  appear  in  Proc.  TWDM 
2002. 

[5]  L.  Hadjiiski,  H.P.  Chan,  N.  Petrick,  B.  Sahiner,  M.  Gurcan,  M.A.  Helvie,  at  al, 
“Computerized  Regional  Registration  of  Corresponding  Microcalcification  Clusters  on 
Temporal  Pairs  of  Mammograms  for  Interval  Change  Analysis”,  Presented  at  the  S7*^' 
Scientific  Assembly  and  Annual  Meeting  of  the  Radiological  Society  of  North  America 
(RSNA),  Chicago,  Illinois,  November  25  -  30,  2001.  Radiology  2001;  221  (P):  425. 

[6]  L.  Hadjiiski,  H.P.  Chan,  M.  Gurcan,  B.  Sahiner,  N.  Petrick,  M.A.  Helvie,  M.  Roubidoux 
“Computer-Aided  Characterization  of  Malignant  and  Benign  Microcalcification  Clusters 
Based  on  the  Analysis  of  Temporal  Change  of  Mammographic  Features”,  Presented  at  the 
SPIE  International  Symposium  on  Medical  Imaging,  San  Diego,  California,  February  23-28, 
2002.  To  appear  in  Proc.  SPIE  Medical  Imaging  2002. 


Copies  of  publications  are  enclosed  with  this  report. 


(8)  Conclusion 

During  this  year,  we  have  continued  the  development  of  the  regional  registration  technique. 
The  adaptive  similarity  measure  (ASM)  improves  the  localization  of  the  corresponding  mass  on  the 
prior  mammogram.  179  temporal  pairs  of  mammograms  containing  biopsy-proven  masses  were 
used  for  evaluation  of  the  detection  acuracy.  86%  of  the  estimated  lesion  locations  resulted  in  an 
area  overlap  of  at  least  50%  with  the  true  lesion  locations.  The  average  distance  between  the 
estimated  and  the  true  centroids  of  the  lesions  on  the  prior  mammogram  was  4.5  ±6.7  mm.  In 
comparison,  the  correct  localization  and  the  average  distance  using  a  conventional  correlation 
similarity  measure  were  84%  and  4.9  ±7.0  mm,  respectively.  The  registration  accuracy  of  the 
current  method  has  been  improved  in  comparison  with  that  without  ASM.  This  result  indicates  that 
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our  technique  is  a  promising  approach  for  identification  of  corresponding  lesions  on  temporal  pairs 
of  mammograms  and  thus  may  be  used  as  a  basis  for  analysis  of  interval  change  on  mammograms. 
We  will  continue  to  enlarge  the  data  set  and  improve  the  registration  method  in  the  coming  year. 

To  evaluate  the  accuracy  of  computer  segmentation  of  masses,  239  regions  of  interest 
containing  the  corresponding  masses  were  identified  by  MQSA  radiologist  on  the  current  and  prior 
mammograms  of  the  temporal  pair.  The  masses  were  automatically  segmented  using  a  K-means 
clustering  algorithm  and  active  contour  model.  Additionally,  hand  drawn  mass  boundaries  from 
radiologists  were  obtained  and  compared  with  the  computer  segmentations.  The  initial  mass 
segmentation  by  the  K-means  clustering  algorithm  was  satisfactory  (average  area  overlap  of  40%, 
average  Hausdorff  distance  of  5.58  mm,  and  average  Hausdorff  distance  of  2.19  mm  averaged  over 
239  ROIs).  The  active  contour  model  further  improved  the  accuracy  of  mass  segmentation  (average 
Area  overlap  of  67%,  average  Hausdorff  distance  of  4.49  mm,  and  average  Hausdorff  distance  of 
1.27  mm  averaged  over  239  ROIs  ).  The  active  contour  model  is  therefore  useful  for  precise  mass 
segmentation. 

For  the  task  of  feature  extraction,  we  evaluated  35  features  (20  texture,  12  morphological  and 
3  spiculation)  extracted  from  each  mass.  Additional  difference  features  were  obtained  by 
subtracting  the  features  of  the  prior  mass  from  those  of  the  current  mass.  Therefore,  35  difference 
features  were  derived  from  the  20  texture,  12  morphological  and  3  spiculation  features.  The  feature 
space  for  each  temporal  pair  consisted  of  the  texture,  spiculation  and  morphological  features  from 
both  the  prior  and  the  current  mammograms  and  the  difference  features.  These  features  were 
evaluated  for  their  effectiveness  in  classification  of  malignant  and  benign  temporal  masses  as  well 
as  detection  of  temporal  change. 

We  designed  a  new  classification  scheme  allowing  direct  merge  of  current  and  prior 
information.  The  input  feature  space  to  the  classifier  included  the  current,  prior  and  difference 
features.  This  allowed  the  classifier  to  choose  the  individual  current  and  prior  features  or  the 
difference  features  in  order  to  obtain  the  best  combination  and  merge  of  the  features  for  high 
classification  accuracy  and  optimal  detection  of  interval  change.  It  was  found  that  the  difference 
RLS  texture  features  and  spiculation  features  were  useful  for  identification  of  malignancy  in 
temporal  pairs  of  mammograms.  The  information  on  the  prior  image  was  important  for 
characterization  of  the  masses;  5  out  of  the  10  selected  features  contained  prior  information.  We 
found  that  the  mass  size  descriptors  were  not  discriminatory  features  for  these  difficult  cases 
because  many  of  the  benign  masses  also  grew  over  time.  In  comparison  with  the  classification 
based  on  image  information  from  the  current  images  alone,  the  temporal  change  information 
significantly  (p=0.015)  improved  the  accuracy  for  classification  of  the  masses  in  terms  of  the  total 
area  under  the  ROC  curve  (Az).  The  partial  area  under  the  ROC  curve  for  the  classifier  based  on 
the  temporal  pairs  -  0.37)  is  also  improved  compared  to  the  classifier  based  only  on  the 

current  images  (Az^°  =  0.32),  although  the  difference  did  not  achieve  statistical  significance. 

We  performed  a  pilot  study  for  the  design  of  observer  performance  experiments  with  ROC 
methodology  to  evaluate  the  effects  of  computer  classification  on  radiologists’  estimates  of  the 
likelihood  of  malignancy  of  masses.  Two  radiologists  read  a  data  set  of  temporal  pairs.  For  this 
data  set,  the  computer  classifier  achieved  a  test  Az  value  of  0.86.  The  radiologists’  Az  values  for 
the  likelihood  of  malignancy  were  0.72  and  0.74  without  CAD,  and  improved  to  0.76  and  0.75, 
respectively,  with  CAD.  The  improvement  was  statistically  significant  (p=0.0006)  for  the  first 
radiologist.  For  the  BI-RADS  assessments,  the  two  radiologists  obtained  Az  values  of  0.67  and 
0.77  without  CAD  and  improved  to  0.73  and  0.79,  respectively,  with  CAD.  The  improvements 
were  also  statistically  significant  (p<0.001).  This  pilot  study  will  be  the  basis  for  our  design  of  a 
full-scale  ROC  study  to  evaluate  the  effects  of  CAD  of  interval  changes  on  the  performance  of 
radiologists. 
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Further  study  is  underway  to  develop  a  feature  matching  method  to  improve  lesion 
localization  within  the  search  region.  We  will  continue  the  development  of  automated  method  to 
extract  and  analyze  features  extracted  from  corresponding  masses  on  a  temporal  pair  of 
mammograms  for  analysis  of  the  temporal  changes. 

We  were  able  to  start  the  transfer  of  the  developed  methods  for  detection  and  classification  of 
temporal  masses  to  detection  and  classification  of  temporal  microcalcifciation  clusters.  We  carried 
out  a  preliminary  study,  which  showed  promising  results  and  we  applied  for  an  IDEA  grant  at  the 
U.S.  Army  Medical  Research  and  Materiel  Command.  The  research  grant  was  approved  and  we 
will  have  the  opportunity  to  extend  the  already  developed  methods  and  design  new  methods  for 
detection,  classification  and  analysis  of  temporal  microcalcifciation  clusters. 
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Analysis  of  interval  change  is  important  for  mammographic  interpretation.  The  aim  of  this  study  is 
to  evaluate  the  use  of  an  automated  registration  technique  for  computer-aided  interval  change 
analysis  in  mammography.  Previously  we  developed  a  regional  registration  technique  for  identify¬ 
ing  masses  on  temporal  pairs  of  mammograms.  In  the  current  study,  we  improved  lesion  registra¬ 
tion  by  including  a  local  alignment  step.  Initially,  the  lesion  position  on  the  prior  mammogram  was 
estimated  based  on  the  breast  geometry.  An  initial  fan-shaped  search  region  was  then  defined  on  the 
prior  mammogram.  In  the  second  stage,  the  location  of  the  fan- shaped  region  on  the  prior  mam¬ 
mogram  was  refined  by  warping,  based  on  an  affine  transformation  and  simplex  optimization  in  a 
local  region.  In  the  third  stage,  a  search  for  the  best  match  between  the  lesion  template  from  the 
current  mammogram  and  a  structure  on  the  prior  mammogram  was  carried  out  within  the  search 
region.  This  technique  was  evaluated  on  124  temporal  pairs  of  mammograms  containing  biopsy- 
proven  masses.  Eighty-seven  percent  of  the  estimated  lesion  locations  resulted  in  an  area  overlap  of 
at  least  50%  with  the  true  lesion  locations  and  an  average  distance  of  2.4±2.1  mm  between  their 
centroids.  The  average  distance  between  the  estimated  and  the  true  centroid  of  the  lesions  on  the 
prior  mammogram  over  all  124  temporal  pairs  was  4.2±5.7  mm.  The  registration  accuracy  was 
improved  in  comparison  with  our  previous  study  that  used  a  data  set  of  74  temporal  pairs  of 
mammograms.  This  improvement  in  accuracy  resulted  from  the  improved  geometry  estimation  and 
the  local  affine  transformation.  ©  2001  American  Association  of  Physicists  in  Medicine, 

[DOI:  10.1118/1.1376134] 
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I.  INTRODUCTION 

Mammography  is  currently  the  most  effective  method  for 
early  breast  cancer  detection.^’^  One  of  the  important  tech¬ 
niques  used  by  radiologists  in  mammographic  interpretation 
to  detect  developing  malignancy  is  analysis  of  interval 
changes.^’^  A  variety  of  computer-aided  diagnosis  (CAD) 
techniques  have  been  developed  to  detect  mammographic 
abnormalities  and  to  distinguish  between  malignant  and  be¬ 
nign  lesions.  We  are  studying  the  use  of  CAD  techniques  to 
assist  radiologists  in  interval  change  analysis. 

Sallam  et  al^  have  proposed  a  warping  technique  for 
mammogram  registration  based  on  manually  identified  con¬ 
trol  points.  A  mapping  function  was  calculated  for  mapping 
each  point  on  the  current  mammogram  to  a  point  on  the  prior 
mammogram.  Brzakovic  et  al^  have  investigated  a  three- 
step  method  for  comparison  of  the  most  recent  and  the  prior 
mammograms.  They  first  registered  two  mammograms  using 
the  method  of  principal  axis,  and  partitioned  the  current 
mammogram  using  a  hierarchical  region-growing  technique. 
Translation,  rotation,  and  scaling  were  then  used  for  registra¬ 
tion  of  the  partitioned  regions.  Vujovic  et  aC  have  proposed 
a  multiple-control-point  technique  for  mammogram  registra¬ 
tion.  They  first  determined  several  control  points  indepen¬ 
dently  on  the  current  and  prior  mammograms  based  on  the 


intersection  points  of  prominent  anatomical  structures  in  the 
breast.  A  correspondence  between  these  control  points  was 
established  based  on  a  search  in  a  local  neighborhood  around 
the  control  point  of  interest. 

The  previous  techniques  depend  on  the  identification  of 
control  points.  However,  because  the  breast  is  mainly  com¬ 
posed  of  soft  tissue  that  can  change  over  time,  there  are  no 
obvious  landmarks  on  mammograms.  The  crossing  line 
structures  are  often  fibrous  tissue  from  different  depths  of  the 
breast  which  overlap  in  a  projection  image.  These  crossing 
points  are  not  invariant  landmarks  on  different  mammo¬ 
grams.  Because  of  the  elasticity  of  the  breast  tissue,  there  is 
large  variability  in  the  positioning  and  compression  used  in 
mammographic  examination.  As  a  result,  the  relative  posi¬ 
tions  of  the  breast  tissues  projected  onto  a  mammogram  vary 
from  one  examination  to  the  other.  Techniques  that  depend 
on  identification  of  control  points  may  not  be  generally  ap¬ 
plicable  to  registration  of  breast  images. 

Gopal  etal^~^^  and  Hadjiiski  et  al}^  have  developed  a 
multistage  technique  that  defines  the  transformation  to  lo¬ 
cally  map  the  position  of  the  mass  on  a  current  mammogram 
to  that  of  the  prior  mammogram.  A  local  search  for  the  mass 
is  then  performed  on  the  prior  mammogram.  Good  et  al}^ 
also  have  developed  a  technique  that  defines  a  transforma- 
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Current  Mammogram  Prior  Mammogram 


Identification  of  corresponding  lesion 


Fig.  1 .  Block  diagram  of  the  regional  registration  technique. 


tion  to  map  all  points  from  the  current  mammogram  onto  a 
prior  mammogram.  The  current  mammogram  is  then  sub¬ 
tracted  from  the  prior  mammogram. 

The  goal  of  our  research  is  to  develop  a  technique  for 
computerized  analysis  of  temporal  differences  between  a 
mass  on  the  most  recent  mammogram  and  a  prior  mammo¬ 
gram  of  the  same  view.  The  computer  algorithm  will  assist 
radiologists  in  quantifying  interval  changes  and  thus  distin¬ 
guishing  between  benign  and  malignant  masses  for  CAD. 
When  fully  developed,  the  technique  will  be  applied  to  a 
mass  on  the  current  mammogram  either  identified  by  the 
radiologist  or  by  an  automated  mass  detection  program,  thus 
the  interval  change  analysis  can  be  an  integrated  part  of  an 
automated  CAD  system.  In  this  study,  we  focused  on  the 
development  of  an  automated  registration  technique  that  lo¬ 
calizes  the  corresponding  mass  on  the  prior  mammogram 
when  the  mass  on  the  current  mammogram  is  known.  There¬ 
fore,  we  used  radiologist-identified  mass  location  on  the  cur¬ 
rent  mammogram  as  a  starting  point  and  that  on  the  prior 
mammogram  as  the  ground  truth  for  evaluation  of  the  regis¬ 
tration  technique.  A  local  registration  technique  was  devel¬ 
oped  based  on  an  affine  transformation  and  simplex  optimi¬ 
zation  and  its  usefulness  in  improving  the  localization  of  the 
mass  on  the  prior  mammogram  was  investigated. 

II.  REGISTRATION  TECHNIQUE 

A  multistage  regional  registration  technique  was  devel¬ 
oped  for  identifying  corresponding  masses  on  temporal  pairs 
of  mammograms.  The  block  diagram  of  the  regional  regis¬ 
tration  technique  is  shown  in  Fig.  1.  In  the  first  stage,  an 
initial  fan-shaped  search  region  was  defined  on  the  prior 
mammogram  based  on  the  mass  location  on  the  current 
mammogram.  In  the  second  local  alignment  stage,  the  loca¬ 
tion  of  the  search  region  on  the  prior  mammograms  was  first 
refined  by  maximizing  a  correlation  measure  between  a  tem¬ 
plate  of  the  fan-shaped  region  centered  at  the  mass  extracted 
from  the  current  mammogram  and  the  breast  structures  on 
the  prior  mammogram.  The  affine  transformation  in  combi¬ 
nation  with  simplex  optimization  was  then  employed  to  warp 
this  local  region  and  further  improve  the  correlation.  In  the 
final  stage,  a  search  for  the  best  match  between  the  lesion 
template  from  the  current  manmiogram  and  a  structure  on 
the  prior  mammogram  was  carried  out  within  the  refined 


Fig.  2.  An  example  of  a  pair  of  current  and  prior  mediolateral  oblique 
mammograms  in  our  data  set.  The  arrows  point  to  the  masses  on  the  current 
and  the  prior  mammograms.  The  white  lines  represent  the  breast  boundary 
determined  by  the  automated  boundary  detection  procedure. 

search  region.  A  more  detailed  explanation  for  each  of  the 
stages  will  be  presented  in  the  following  subsections. 

A.  Stage  1 — Initial  estimate  of  search  region 

We  have  modified  our  previous  method  to  define  a  fan¬ 
shaped  search  region  on  the  prior  mammogram.  Initially  an 
automated  procedure  is  used  to  detect  the  breast  boundary  on 
the  mammograms  (Fig.  2).  The  location  of  the  mass  on  the 
current  mammogram  is  determined  in  a  polar  coordinate  sys¬ 
tem  with  the  nipple  as  the  origin.  By  using  the  radial  distance 
/?curr  between  the  nipple  and  mass  centroid,  |NM|,  an  arc  is 
drawn  which  intersects  the  breast  boundary  at  points  A  and 
B  (Fig.  3).  Three  angles  are  estimated  at  the  radial  distance 
/?cuir*  Th^  angle  p  between  NM  and  NA,  the  angle  cp  be¬ 
tween  NM  and  NB,  and  the  angle  B  between  NA  and  NB 
The  location  of  the  mass  is  determined  by  /?curr 
and  the  angle  (3  or  <p.  The  angle  0  is  the  breast  width  at  the 
radial  distance  /?cuir-  Using  the  radial  distance  to  draw 
an  arc  centered  at  the  nipple  centroid  on  the  prior  mammo¬ 
gram,  N',  the  two  intersect  points  A'  and  B'  with  the  breast 
boundary  on  the  prior  mammogram  are  determined.  The 


Fig.  3.  Initial  estimation  of  the  mass  location  on  the  prior  mammogram, 
based  on  the  nipple^mass  centroid  distance  and  an  angular  distance  from  the 
breast  periphery  on  the  current  mammogram. 
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Fig.  4.  Definition  of  an  initial  fan -shaped  search  region  on  the  prior  mam¬ 
mogram  and  a  fan-shaped  template  on  the  current  mammogram. 

angle  0p  between  the  axes  |N'A'|  and  |N'B'|  is  estimated. 
An  angular  scaling  factor  a  can  be  calculated  as  the  ratio  of 
the  prior  and  the  current  angles,  a=  6p  10. 

In  order  to  predict  the  angular  location  of  the  mass  on  the 
prior  mammogram,  the  smaller  angle  between  yS  and  (p  is 
selected  as  the  angular  coordinate  of  the  mass  on  the  current 
mammogram.  The  smaller  angle  is  used  because  we  found 
by  experiment  that  it  produces  a  smaller  angular  deviation 
error  than  using  the  larger  angle.  The  angular  deviation  error 
is  defined  as  the  angle  between  the  axis  connecting  the 
nipple  and  the  true  mass  centroid  and  the  axis  connecting  the 
nipple  and  the  predicted  mass  centroid  on  the  prior  mammo¬ 
gram.  The  selected  angle,  multiplied  by  the  angular  scaling 
factor  a,  is  used  as  the  predicted  angle  from  the  correspond¬ 
ing  axis  on  the  prior  mammogram.  The  radial  distance  /?curr 
is  used  to  predict  the  radial  position  of  the  mass  on  the  prior 
mammogram. 

An  initial  fan-shaped  search  region  is  then  defined  on  the 
prior  mammogram  centered  at  the  predicted  location  of  the 
mass  centroid  (Fig.  4).  The  size  of  the  fan-shaped  region  is 
estimated  previously^®  to  have  the  form  +  ^2/^curT 
S~k^,  where  le  determines  the  angular  width  and  2S  deter¬ 
mines  the  radial  length  of  the  fan-shaped  region.  The  con¬ 
stants  ki,k2,  and  ^3  were  chosen  experimentally  such  that 
the  estimated  fan-shaped  regions  will  essentially  include  all 
mass  centroids  on  the  prior  mammograms.  A  fan-shaped 
template  centered  at  the  mass  is  also  defined  on  the  current 
mammogram.  More  details  on  defining  the  fan- shaped  region 
can  be  found  in  Appendix  A  and  in  Ref.  10. 

B.  Stage  2— Refinement  of  search  region  by  warping 
and  alignment 

The  second  stage  combined  two  procedures.  First,  the  lo¬ 
cation  of  the  search  region  on  the  prior  mammograms  was 
refined  by  maximizing  a  correlation  measure  between  the 
fan- shaped  template  extracted  from  the  current  mammogram 
and  the  breast  structures  on  the  prior  mammogram.  The  tem¬ 
plate  was  shifted  pixel  by  pixel  within  the  initial  fan-shaped 
search  region  and  a  correlation  measure  was  calculated  at 
each  pixel  location.  The  pixel  location  providing  the  maxi- 
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Fig.  5.  The  fan-shaped  template  and  the  warped  fan-shaped  template 
by  the  affine  transformation. 

mum  correlation  is  used  as  the  center  of  a  refined  search 
region.  This  is  basically  a  template  matching  operation.  Sec¬ 
ond,  the  affine  transformation  in  combination  with  simplex 
optimization  was  iteratively  used  to  warp  the  fan-shaped 
template  and  further  maximize  the  correlation  measure  with 
the  breast  structures  on  the  prior  mammogram. 

1.  Affine  transformation 

An  affine  transformation^^  is  a  linear  transformation  com¬ 
bining  scaling,  rotation,  and  translation.  A  two-dimensional 
affine  transformation  is  defined  as  follows: 

where  (;c,y)  are  the  original  coordinates,  (x',y')  are  the 
transformed  coordinates,  and  a,  by  d,  e,  c,  f  are  the  transfor¬ 
mation  coefficients.  The  coefficients  a,  by  d,  e  determine  a 
scaling  and  a  rotation,  and  the  coefficients  c  and /determine 
a  translation.  The  result  of  applying  the  affine  transformation 
of  Eq.  (1)  in  combination  with  the  simplex  optimization  (de¬ 
scribed  below)  to  refine  the  fan-shaped  search  region  is 
shown  in  Fig.  5.  Since  the  affine  transformation  is  linear,  the 
transformed  object  is  linearly  resized  and  rotated.  This  can 
be  observed  from  the  edges  of  the  bounding  box  of  the  fan¬ 
shaped  region  (white  box  in  Fig.  5).  After  the  transformation 
the  edges  are  still  straight  lines,  however,  the  comer  angles 
are  different  from  90  degrees  and  the  lengths  of  the  lines  are 
linearly  scaled. 

2.  Nonlinear  simplex  optimization 

The  nonlinear  simplex  optimization  by  Nelder  and 
Mead^"^’^^  is  used  to  adjust  the  coefficients  by  c,  d,  e, 
and /and  to  warp  the  fan- shaped  template,  thereby  maximiz¬ 
ing  the  correlation  between  the  template  and  a  breast  struc¬ 
ture  on  the  prior  mammogram.  This  optimization  defines  a 
hyper-polygon.  For  each  vertex  an  error  function  is  calcu¬ 
lated.  The  polygon  is  then  “rolled”  towards  the  minimum. 
The  movement  of  the  polygon  (towards  the  minimum)  is 
obtained  by  reflection  in  the  direction  opposite  to  the  vertex 
with  the  maximal  error.  Figure  5  shows  the  result  of  appli¬ 
cation  of  the  affine  transformation  whose  coefficients  were 
obtained  by  the  nonlinear  simplex  optimization.  A  more  de¬ 
tailed  discussion  on  this  optimization  method  can  be  found  in 
Appendix  B  and  Refs.  14  and  15. 
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Fig.  6.  A  refined  search  region  was  defined  on  the  prior  mammogram.  A 
search  for  the  best  match  between  the  mass  template  from  the  current  mam¬ 
mogram  and  a  structure  on  the  prior  mammogram  was  carried  out  within  the 
refined  search  region.  (A — mass  template  on  cunent  mammogram, 
B — warped  fan-shaped  region  from  current  mammogram,  C — refined  search 
region). 

3.  Stage  3— Mass  template  matching  and 
localization  of  corresponding  lesion 

At  this  stage  a  new  search  region  with  a  reduced  size  is 
defined  on  the  prior  mammogram  (Fig.  6).  The  reduced  size 
of  the  search  region  is  determined  experimentally  by  itera¬ 
tive  adjustment  of  the  size  of  the  rectangular  region  targeting 
the  improvement  of  the  final  result.  A  template  containing 
the  mass  is  extracted  from  the  current  mammogram.  The 
mass  location  on  the  prior  mammogram  is  then  determined 
by  maximizing  the  correlation  between  the  template  and  a 
structure  within  the  search  region  (Fig.  7). 

III.  DATA  SET 

A  set  of  124  temporal  pairs  of  mammograms  containing 
biopsy-proven  masses  on  the  current  mammograms  was  used 
to  examine  the  performance  of  this  approach.  Different 
mammographic  views  of  the  same  breast  were  also  included. 
There  were  a  total  of  221  mammograms  obtained  from  54 
cases.  Temporal  pairs  were  formed  using  the  temporal  se- 


Fig.  7.  Final  identification  of  the  corresponding  mass  on  the  prior  mammo¬ 
gram.  (A — Mass  template  on  current  mammogram,  B — Refined  search  re¬ 
gion,  C — Identified  mass  location). 
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quence  from  the  corresponding  view.  Some  cases  contained 
mammograms  of  multiple  years  and  a  combination  of  the 
mammograms  from  different  prior  years  with  the  current- 
year  mammogram  formed  multiple  temporal  pairs.  Thirty 
five  of  the  mammograms  were  digitized  with  a  LUMISYS 
DIS-1000  laser  scanner  at  a  pixel  resolution  of  100  yumX  100 
jjm  and  4096  gray  levels.  The  digitizer  was  calibrated  so  that 
gray  level  values  were  linearly  proportional  to  the  optical 
density  (OD)  within  the  range  of  0.1 -2.8  OD  units,  with  a 
slope  of  0.001  OD/pixel  value.  Outside  this  range,  the  slope 
of  the  calibration  curve  decreased  gradually.  The  OD  range 
of  the  digitizer  was  0-3.5.  The  remaining  186  mammograms 
were  digitized  with  a  LUMISCAN  85  laser  scanner  at  a  pixel 
size  of  50  /imX50  fjm  and  4096  gray  levels.  The  digitizer 
was  calibrated  so  that  the  gray  level  values  were  linearly 
proportional  to  the  OD  within  the  range  of  0-4  OD  units, 
also  with  a  slope  of  0.001  OD/pixel  value.  Output  from  both 
digitizers  was  linearly  converted  so  that  large  pixel  value 
corresponded  to  a  low-optical  density.  In  order  to  process  the 
mammograms  digitized  with  these  two  different  digitizers, 
the  images  were  first  averaged  using  a  filter  that  has  constant 
weights  over  the  entire  filter  kernel  and  then  were  down- 
sampled.  This  filter  will  be  referred  to  as  a  box  filter.  The 
images  digitized  with  the  LUMISCAN  85  digitizer  were  av¬ 
eraged  with  a  16X 16  box  filter  and  then  were  down-sampled 
by  a  factor  of  16.  The  images  digitized  with  the  LUMISYS 
DIS-1000  digitizer  were  averaged  with  an  8X8  box  filter  and 
then  were  down-sampled  by  a  factor  of  8.  Therefore,  all  re¬ 
sulting  images  had  a  pixel  size  of  800  /tmX800  juLtn. 

The  54  cases  contained  53  biopsy  proven  and  one 
follow-up  masses.  The  221  mammograms  contained  differ¬ 
ent  mammographic  views  and  multiple  years  of  the  masses 
including  the  year  when  the  biopsy  was  performed.  Of  the 
124  temporal  pairs  of  mammograms  73  were  malignant  and 
51  benign.  A  malignant  temporal  pair  consists  of  a  biopsy 
proven  malignant  mass  or  a  mass  that  was  followed  up  and 
was  found  to  be  malignant  when  a  biopsy  was  performed  in 
a  future  year.  Of  the  124  temporal  pairs  of  mammograms,  63 
were  CC-view  pairs,  48  were  MLO-view  pairs,  and  13  were 
lateral- view  pairs.  A  Mammography  Quality  Standards  Act 
(MQSA)-approved  radiologist  read  the  original  mammogram 
to  identify  the  mass  and  provide  description  of  its  character¬ 
istics.  The  radiologist  defined  a  bounding  box  around  the 
mass  and  marked  the  nipple  location  on  every  film. 

The  radiologist  also  measured  the  mass  sizes,  defined  as 
the  longest  dimension  of  the  mass,  both  on  the  current  and 
prior  mammograms.  In  Figs.  8(a)  and  8(b)  the  mass  sizes  on 
the  current  mammograms  were  plotted  against  those  on  the 
prior  mammograms  for  the  malignant  and  the  benign  tempo¬ 
ral  pairs,  respectively.  Only  103  temporal  pairs  were  plotted 
(54  malignant  and  49  benign)  due  to  the  fact  that  the  masses 
on  the  prior  mammograms  in  the  remaining  21  temporal 
pairs  were  too  subtle  for  the  radiologist  to  estimate  their 
boundaries.  On  average  the  malignant  masses  appear  to  have 
a  larger  increase  in  size  than  the  benign  masses.  The  mean 
increase  in  size  from  prior  to  current  for  the  malignant 
masses  is  4.2  mm  compared  to  1.6  mm  for  the  benign  masses 
(p=0.008).  The  correlation  coefficient  is  0.71  for  the  malig- 
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Mass  Size  In  Current  mammogram  (mm) 

(a) 


Mass  Size  in  Current  mammogram  (mm) 

(b) 

Fig.  8.  Mass  sizes  measured  by  an  MQSA-approved  radiologist  on  the  cur¬ 
rent  mammograms  plotted  against  those  on  the  prior  mammograms  for  (a) 
54  malignant  and  (b)  49  benign  temporal  pairs.  The  diagonal  line  on  the 
graph  represents  the  case  when  the  current  and  the  prior  mass  sizes  are 
identical.  The  dashed  lines  are  the  linear  regression  lines  defined  by  y 
=  0.469.x +3.012  for  (a)  and  by  y  =  0.638x  + 3.242  for  (b).  The  correlation 
coefficient  for  malignant  masses  is  0.71  and  for  benign  masses  is  0.83. 


nant  masses  and  0.83  for  the  benign  masses  [Fig.  8(a)  and 
8(b)]. 

The  radiologist  also  rated  the  visibility  of  the  masses  on 
the  mammograms  relative  to  those  encountered  in  clinical 
practice  on  a  10-point  scale,  with  one  represents  the  most 
obvious  and  10  the  subtlest  masses.  The  visibility  of  the 
masses  on  the  current  mammogram  is  plotted  against  those 
on  the  prior  mammogram  in  Fig.  9  for  the  73  malignant  and 
51  benign  temporal  pairs.  Generally,  the  malignant  masses 
were  less  visible  on  the  prior  mammograms  while  the  vis¬ 
ibility  of  the  benign  masses  was  found  to  be  more  similar. 
The  mean  difference  in  visibility  between  the  prior  and  the 
current  mammograms  for  the  malignant  masses  is  2.8  com¬ 
pared  to  0.7  mm  for  the  benign  masses  (p=0.0002).  The 
correlation  coefficient  is  0.06  for  malignant  masses  and  0.54 
for  benign  masses  [Figs.  9(a)  and  9(b)].  For  most  of  the 


Mass  Visibility  in  Current  Mammogram 

(a) 


Mass  Visibility  in  Current  Mammogram 

(b) 

Fig.  9.  Visibility  of  the  masses  on  the  current  mammogram  plotted  against 
those  on  the  prior  mammogram  for  (a)  malignant  and  (b)  benign  temporal 
pairs.  The  visibility  was  rated  on  a  10-point  discrete  scale  (1  =most  obvious, 
10= subtlest).  Because  many  of  the  data  points  overlap,  we  indicate  the 
number  of  points  with  the  same  rating  by  a  number  next  to  the  symbol  (m  or 
b).  The  diagonal  line  on  the  graph  represents  the  case  when  the  current  and 
the  prior  mass  sizes  are  identical.  The  dashed  lines  are  the  linear  regression 
lines  defined  by  y=0.055x  +  7.44  for  (a)  and  by  y  =  0.658x  +  2.138  for  (b). 
The  correlation  coefficient  for  malignant  masses  is  0.06  and  for  benign 
masses  is  0.54. 


temporal  pairs  the  time  interval  between  the  current  and  the 
prior  mammogram  was  12  months  (Fig.  10). 

IV.  EVALUATION  METHODS 

The  accuracy  of  the  multistage  regional  registration  was 
analyzed  in  terms  of  two  measures.  The  first  measure  is  the 
overlap  area  between  the  estimated  and  the  true  lesions  on 
the  prior  mammogram.  The  fractions  of  registered  temporal 
pairs  that  could  provide  an  accuracy  of  over  50%  area  over¬ 
lap  and  over  75%  area  overlap  were  examined.  The  second 
measure  is  the  average  Euclidean  distance  between  the  cen¬ 
troids  of  the  estimated  and  the  true  lesion  locations. 
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Temporal  difference  (month) 


Fig.  10.  Temporal  interval  between  the  current  and  the  prior  mammograms 
for  the  1 24  temporal  pairs  in  our  data  set. 

V.  REGISTRATION  RESULTS 

A.  Stage  1— Initial  estimate  of  search  region 

At  this  stage  an  initial  estimation  of  the  mass  location  on 
the  prior  mammogram  was  carried  out  based  on  the  geo¬ 
metrical  position  of  the  mass  on  the  current  mammogram. 
Based  on  observation  of  the  radial  deviation  errors  and  the 
angular  deviation  errors,  the  fan- shaped  search  region  was 
estimated  to  be  0.25  +  5//? radians  and  ^=20  mm.  This 
definition  of  the  fan-shaped  search  region  resulted  in  an  av¬ 
erage  search  area  of  1462  mm^  on  the  prior  mammograms. 
For  the  124  temporal  image  pairs  used  in  this  study,  the 
Euclidean  distance  between  the  initial  estimate  of  the  cen¬ 
troid  location  of  the  corresponding  structure  on  the  prior 
mammogram  and  the  center  of  the  bounding  box  of  the  mass 
provided  by  the  radiologist  was  estimated.  For  the  124  tem¬ 
poral  image  pairs,  the  average  Euclidean  distance  error  of  the 
initial  estimate  was  8.4±5.4  mm.  The  error  distributions  for 
both  the  malignant  and  the  benign  pairs  are  shown  in  Fig.  1 1 . 
At  this  initial  stage,  57%  of  the  estimated  lesion  locations 
resulted  in  an  area  overlap  of  at  least  50%  with  the  true 
lesion  locations  and  27%  resulted  in  an  area  overlap  of  at 
least  75%  (Fig.  12). 

B,  Stage  2 — Refinement  of  search  region  by  warping 
and  alignment 

At  the  second  stage,  the  location  of  the  search  region  on 
the  prior  mammogram  was  first  refined  by  maximizing  a 
correlation  measure  between  the  fan-shaped  template  ex¬ 
tracted  from  the  current  mammogram  and  the  breast  struc¬ 
tures  on  the  prior  mammogram.  The  affine  transformation  in 
combination  with  simplex  optimization  was  then  employed 
to  warp  this  local  region.  For  the  124  temporal  image  pairs, 
the  average  Euclidean  distance  error  after  the  second  stage 
was  7.5  ±5.4  mm.  At  this  stage,  59%  of  the  estimated  lesion 
locations  resulted  in  an  area  overlap  of  at  least  50%  with  the 
true  lesion  locations,  and  36%  resulted  in  an  area  overlap  of 
at  least  75%.  The  average  Euclidean  distance  error  at  this 


Centroids  Distance  [mm] 


Fig.  11.  Distribution  of  Euclidean  distance  error  between  the  initial  estimate 
of  the  mass  centroid  location  on  the  prior  mammogram  and  the  center  of  the 
bounding  box  of  the  mass  provided  by  the  radiologist  for  the  malignant  and 
benign  pairs  after  the  first  detection  stage. 

Stage  was  reduced  compared  to  that  of  the  first  stage,  how¬ 
ever,  it  did  not  achieve  statistical  significance  (p=0.07). 

After  the  simplex  optimization,  the  search  region  was  re¬ 
duced  to  a  constant  size  of  24  mmX24  mm  (=576  mm^) 
centered  at  the  refined  fan-shaped  region  for  every  prior 
mammogram. 

C.  Stage  3— Mass  template  matching  and  localization 
of  corresponding  lesion 

At  this  final  stage,  a  search  for  the  best  match  between  the 
lesion  template  from  the  current  mammogram  and  a  structure 
on  the  prior  mammogram  was  carried  out  within  the  refined 
search  region.  This  template  matching  resulted  in  87%  of  the 
estimated  lesion  locations  having  an  area  overlap  of  at  least 
50%  with  the  true  lesion  locations.  The  distributions  of  the 
Euclidean  error  for  the  malignant  and  the  benign  temporal 
pairs  are  shown  in  Fig.  13.  The  average  distance  between  the 
estimated  and  the  true  centroids  of  the  lesions  on  the  prior 
mammogram  for  all  124  pairs  was  4.2 ±5. 7  mm  with  a  maxi¬ 
mum  of  31.6  mm.  These  results  are  summarized  in  Table  I. 
For  the  87%  of  the  temporal  pairs  with  50%  overlap,  the 
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Fig.  12.  Distribution  of  the  area  overlap  between  the  estimated  and  the  true 
lesion  locations  for  124  temporal  pairs  after  the  first  detection  stage. 
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Fig.  13.  Distribution  of  Euclidean  distance  error  between  the  estimate  of  the 
mass  centroid  location  on  the  prior  mammogram  and  the  center  of  the 
bounding  box  of  the  mass  provided  by  the  radiologist  for  the  malignant  and 
benign  pairs  after  the  final  detection  stage. 

average  distance  between  the  estimated  and  the  true  cen¬ 
troids  of  the  lesions  on  the  prior  mammogram  was  2.4±2.1 
mm  with  a  maximum  of  10.2  mm.  When  a  more  stringent 
criterion  of  75%  overlap  is  imposed,  82%  of  the  masses  on 
the  prior  mammograms  are  considered  to  be  localized  (Fig. 
14).  For  the  82%  of  the  temporal  pairs  with  75%  overlap,  the 
average  distance  between  the  estimated  and  the  true  cen¬ 
troids  of  the  lesions  on  the  prior  mammogram  was  2.2±1.9 
mm  with  a  maximum  of  10.2  mm.  The  average  Euclidean 
distance  error  at  this  stage  was  significantly  reduced  com¬ 
pared  to  the  error  of  the  first  stage  (p =0.000  001)  and  the 
error  of  the  second  stage  (p =0.000 001). 

D.  Study  of  the  importance  of  the  stage  2  procedures 

The  effect  of  the  two  procedures  at  Stage  2  on  the  regis¬ 
tration  accuracy  was  studied.  We  removed  them  one  at  a 
time  and  evaluated  the  registration  results.  When  the  first 
correlation  procedure  was  removed,  the  average  Euclidean 
distance  error  increased  to  5.6±8.2  mm  in  the  final  stage. 
Only  81%  of  the  estimated  lesion  locations  resulted  in  an 
area  overlap  of  at  least  50%  with  the  true  lesion  locations 
and  75%  resulted  in  an  area  overlap  of  at  least  75%  with  the 
true  lesion  locations.  When  the  second  warping  procedure 
was  removed,  the  average  Euclidean  distance  error  increased 
to  5.0 ±6.3  mm  in  the  final  stage.  Only  82%  of  the  estimated 


Table  I.  The  Euclidean  distance  between  the  true  and  the  estimated  cen¬ 
troids  of  the  mass  on  the  prior  mammogram  for  the  three  detection  stages. 


Overall 

50%  overlap 

75%  overlap 

Mean  distance 

8.4  mm 

5.6  mm 

4.5  mm 

Stage  1 

Standard.  Deviation. 

5.4  mm 

2.8  nun 

2.6  mm 

Max.  distance 

29.0  mm 

16.2  mm 

13.8  mm 

Mean  distance 

7.5  mm 

4.9  nun 

3.9  mm 

Stage  2 

Standard.  Deviation. 

5.4  mm 

3.0  mm 

2.6  mm 

Max.  distance 

32.0  mm 

16.9  nun 

11.6  mm 

Mean  distance 

4.2  mm 

2.4  nun 

2.2  mm 

Stage  3 

Standard.  Deviation 

5.7  mm 

2.1  mm 

1.9  mm 

Max.  distance 

31.6  mm 

10.2  nun 

10.2  mm 

Area  overlap  [%] 


Fig.  14.  Distribution  of  the  area  overlap  between  the  estimated  and  the  true 
lesion  locations  for  124  temporal  pairs  after  the  final  detection  stage, 

lesion  locations  resulted  in  an  area  overlap  of  at  least  50% 
with  the  true  lesion  locations  and  76%  resulted  in  an  area 
overlap  of  at  least  75%  with  the  true  lesion  locations. 

VI.  DISCUSSION 

The  approach  proposed  here  has  simplified  the  first  stage 
compared  to  our  previous  method.  In  the  previous  method, 
the  distances  between  the  nipple  and  the  breast  centroid  on 
the  current  and  prior  mammograms  were  determined  and 
used  to  estimate  a  radial  scaling  factor.  The  angular  location 
of  the  mass  was  measured  from  the  nipple-breast  centroid 
axis.  A  global  alignment  procedure  was  used  for  determina¬ 
tion  of  the  breast  centroids.  With  our  new  approach  we 
eliminated  the  scaling  for  the  radial  distance  between  the 
nipple  and  the  mass  location  of  the  prior  mammogram.  The 
breast  periphery  was  used  as  a  reference  for  the  estimation  of 
the  angular  position  of  the  mass.  Therefore,  there  was  no 
need  to  determine  the  breast  centroids  on  the  current  and  the 
prior  mammograms  and  the  global  alignment  procedure 
could  be  eliminated.  This  is  possible  because  the  local  align¬ 
ment  step  provides  better  compensation  for  the  displacement 
of  the  corresponding  masses  on  the  current  and  the  prior 
mammogram  caused  by  different  compression  and  position¬ 
ing  of  the  breast. 

It  was  found  that  the  estimation  of  the  angular  position 
from  the  breast  periphery  allowed  more  precise  localization 
of  the  mass  position  on  the  prior  mammogram  compared  to 
our  previous  method  where  the  angular  position  of  the  mass 
was  estimated  based  on  the  nipple-breast  centroid  axis.^^ 
There  is  a  large  variability  in  the  estimation  of  the  breast 
centroid  location  because  the  extend  of  the  breast  imaged  on 
the  mammogram  at  the  chest  wall  and  at  the  axillary  tail  in 
the  MLO  view  depends  on  the  breast  positioning  and  com¬ 
pression.  This  causes  an  uncertainty  in  defining  the  region  to 
calculate  the  breast  centroid.  In  the  previous  study  using  74 
temporal  pairs,  the  estimated  Euclidean  distance  error  at  the 
first  stage  was  9.8 ±6.0  mm.  The  fan-shaped  search  region 
was  defined  as  6=0.35+5/r,  resulting  in  an  average  area  of 
1865  mm^  for  the  fan-shaped  search  region.  In  the  current 
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study,  the  estimated  Euclidean  distance  error  at  the  first  stage 
was  reduced  to  8.4±5.4  mm  even  though  the  data  set  was 
increased  to  124  temporal  pairs  of  mammograms.  This  al¬ 
lows  the  fan-shaped  region  to  be  reduced  to  e=0.254*5/r, 
resulting  in  an  average  fan-shaped  search  area  of  1462  mm^ 
on  the  prior  images.  The  reduction  of  the  search  area  im¬ 
proves  the  chance  of  correctly  localizing  the  mass  on  the 
prior  mammogram. 

The  second  stage  combined  two  procedures:  First  the  lo¬ 
calization  of  the  search  region  on  the  prior  mammograms 
was  refined  by  maximizing  a  correlation  measure  between 
the  fan-shaped  template  extracted  from  the  current  mammo¬ 
gram  and  the  breast  structures  on  the  prior  mammogram.  The 
affine  transformation  in  combination  with  simplex  optimiza¬ 
tion  was  then  employed  to  warp  and  locally  align  the  tem¬ 
plate  with  the  breast  structures.  Both  procedures  improved 
the  detection  process.  When  one  of  these  procedures  was 
removed  the  registration  results  deteriorated,  as  discussed  in 
the  Results  section. 

With  these  improvements,  the  accuracy  of  the  current  re¬ 
gional  registration  technique  is  improved  over  the  previous 
method.^®  The  current  technique  produced  an  average  Eu¬ 
clidean  distance  error  of  4.2±5.7  mm,  compared  to  5.4±7.5 
mm  when  the  previous  technique  was  applied  to  the  current 
data  set.  This  difference  is  statistically  significant  (p— 0.03). 
82%  of  the  estimated  lesion  locations  resulted  in  an  area 
overlap  of  at  least  75%  with  the  true  lesion  locations  com¬ 
pared  with  72%  when  applying  the  previous  technique  to  the 
current  data  set.  It  is  interesting  to  note  that,  of  the  21 


“masses”  on  the  prior  mammograms  that  the  experienced 
radiologist  could  not  confidently  define  the  mass  and  mea¬ 
sure  its  size,  our  registration  technique  localize  19  of  them 
with  an  area  overlap  greater  than  50%. 

The  average  distance  between  the  estimated  and  the  true 
centroid  of  the  lesions  on  the  prior  mammogram  for  the  sub¬ 
set  of  temporal  pairs  having  50%  overlap  is  about  half  of  that 
of  the  entire  data  set  (Table  I).  The  maximum  distance  for 
this  subset  is  about  1/3  of  that  for  the  entire  data  set. 

With  the  current  regional  registration  technique,  16  tem¬ 
poral  pairs  (13%  of  124  temporal  pairs)  have  an  area  overlap 
less  than  50%.  Twelve  of  the  16  computer  estimated  loca¬ 
tions  do  not  overlap  at  all  with  the  radiologist’s  identified 
locations,  and  the  other  four  pairs  have  an  overlap  between 
1%  and  49%.  Seven  of  them  are  benign  and  nine  are  malig¬ 
nant.  A  major  cause  of  the  misregistration  was  that  the  mass 
was  small  and  subtle  and  a  breast  structure  within  the  search 
region  had  a  higher  correlation  with  the  mass  template  from 
the  current  mammogram.  Figures  15  and  16  show  the  visibil¬ 
ity  ratings  and  sizes  of  these  misregistered  masses.  Eight  of 
the  nine  misregistered  malignant  masses  have  visibility  rat¬ 
ings  of  9  or  10  and  sizes  below  5  mm.  The  misregistered 
benign  masses  are  somewhat  more  obvious  and  larger  in 
sizes  than  the  malignant  ones.  Since  many  of  the  masses  on 
the  prior  mammograms  were  not  interpreted  as  a  mass  with¬ 
out  reference  to  the  current  mammograms,  the  automatic  reg¬ 
istration  with  template  matching  would  be  difficult  with 
these  masses  if  the  search  region  contains  normal,  but  dense 
breast  structures.  We  are  currently  investigating  the  applica¬ 
tion  of  local  mass  detection  in  the  search  region  to  focus 
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template  matching  to  a  few  suspicious  areas.  Morphological 
and  texture  features  will  be  extracted  from  the  potential  mass 
areas  to  provide  additional  matching  information  in  the  fea¬ 
ture  space. 

The  interval  change  analysis,  when  fully  developed,  will 
be  one  of  the  functions  provided  in  an  integrated  CAD  sys¬ 
tem.  The  mass  on  the  current  mammogram  can  be  detected 
by  an  automated  mass  detection  algorithm  or  identified  by  a 
radiologist.  The  CAD  system  will  then  analyze  whether  the 
mass  is  an  existing  or  a  newly  developed  lesion  and  will 
estimate  its  likelihood  of  malignancy.  We  are  developing 
methods  for  characterization  of  malignant  and  benign  masses 
based  on  analysis  of  interval  changes  in  the  mass  features. 
Investigation  of  criteria  to  determine  whether  a  mass  exists 
on  the  prior  mammogram  is  underway.  If  the  mass  is  a  newly 
developed  lesion  on  the  current  mammogram,  it  will  then 
undergo  a  single-exam  analysis  by  the  CAD  system. 

VII.  CONCLUSION 

We  are  developing  an  automated  registration  technique 
for  analysis  of  interval  change  of  a  mass  from  a  previous 
mammographic  exam  to  the  current  one.  In  this  study  we 
found  that  a  local  affine  transformation  in  combination  with 
nonlinear  simplex  optimization  can  improve  the  localization 
and  reduce  the  size  of  the  search  region.  With  the  improved 
method,  87%  of  the  estimated  lesion  locations  in  124  ran¬ 
domly  selected  temporal  pairs  resulted  in  an  area  overlap  of 
at  least  50%  with  the  true  lesion  locations.  When  the  thresh¬ 
old  for  correct  localization  was  set  to  75%  area  overlap,  82% 
of  the  temporal  pairs  still  exceeded  this  threshold.  The  aver¬ 
age  distance  between  the  estimated  and  the  true  centroids  of 
the  lesions  on  the  prior  mammogram  over  all  pairs  was  4.2 
±5.7  mm.  The  registration  accuracy  of  the  current  method 
has  been  improved  in  comparison  with  that  of  our  previous 
method^®  even  though  the  data  set  was  increased  from  74 
pairs  to  124  pairs.  This  improvement  is  obtained  mainly 
from  the  second  stage  affine  transformation  and  simplex  op¬ 
timization.  Additional  studies  are  currently  underway  to  de¬ 
velop  a  feature  matching  method  to  further  improve  lesion 
localization. 
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APPENDIX  A:  DEFINITION  OF  THE  FAN-SHAPED 
REGION  ON  THE  PRIOR  MAMMOGRAM 

Refer  to  Figs.  3  and  4,  the  fan-shaped  region  on  the  prior 
mammogram  is  drawn  based  on  the  nipple  centroid  on  the 
prior  mammogram,  N',  as  the  center  of  the  coordinate  sys- 
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tern.  The  two  bounding  arcs  are  drawn  using  the  radial  dis¬ 
tances  /?cuiT+  ^  ^curr“  ^oth  Centered  at  N'.  The  two 
sides  of  the  fan -shaped  region  are  bounded  by  two  radial 
lines  that  form  angles  e  and  —  £  with  the  line  |N'M'|.  Thus 
the  initial  fan-shaped  search  region  is  centered  as  the  pre¬ 
dicted  location  of  the  mass  centroid  M'  on  the  prior  mam¬ 
mogram  (Fig. 4). 

The  constants  /cj ,  k2,  and  were  chosen  experimentally 
based  on  analysis  of  the  angular  deviation  errors  and  the 
corresponding  radial  deviation  errors  for  the  124  temporal 
pairs.  The  radial  deviation  error  is  defined  as  the  difference 
between  the  predicted  and  the  true  distance  of  the  mass  from 
the  nipple  on  the  prior  mammogram.  The  constants  ki,  k2 
are  obtained  in  such  a  way  that  e  is  the  smallest  upper  bound 
that  can  enclose  all  angular  deviation  errors  for  all  radial 
distances  (/?cur)  temporal  pairs.  The  selection  of  the 

parametric  form  of  €  was  discussed  in  detail  in  Ref.  10.  It 
reduced  e  at  larger  The  constant  was  chosen  to  be 
equal  to  the  maximum  radial  deviation  error. 

APPENDIX  B:  SIMPLEX  OPTIMIZATION 

An  optimization  problem  can  be  defined  as  an  error  func¬ 
tion  that  has  to  be  minimized  by  iterative  selection  of  the 
values  of  the  function  parameters  n.  We  can  define  n  + 1 
dimensional  space,  where  n  dimensions  (degree  of  freedom) 
correspond  to  the  error  function  parameters,  and  one  dimen¬ 
sion  is  the  error  function  itself.  When  the  optimization  func¬ 
tion  is  calculated  for  all  possible  values  of  the  n  parameters, 
and  error  surface  in  (n+ 1)- dimensional  space  will  be  ob¬ 
tained.  Usually  the  error  functions  for  the  real  world  appli¬ 
cations  are  complex  and  nonlinear  and  the  corresponding 
error  surfaces  contain  local  minima. 

The  nonlinear  simplex  optimization  by  Nelder  and 
Mead^"^’^^  defines  a  hyper-polygon  with  n+1  vertexes  in  a 
(n  +  1 )  dimensional  space.  For  each  vertex  the  error  function 
is  calculated.  The  polygon  is  then  “rolled”  towards  the 
minimum.  The  movement  of  the  polygon  (towards  the  mini¬ 
mum)  is  obtained  by  reflection  in  the  direction  opposite  to 
the  vertex  (K)  with  the  maximal  error.  To  achieve  this  the 
center  of  masses  (L)  of  the  hyper-polygon  vertexes  is  calcu¬ 
lated.  A  line  KL  connects  the  center  of  the  masses  with  the 
vertex  with  the  maximal  error.  The  new  vertex  (K')  is  ob¬ 
tained  by  central  projection  of  the  vertex  K  on  the  line  KL 
with  center  L  and  |K'L|  =  f|KL|.  The  coefficient  t  deter¬ 
mines  how  far  the  new  vertex  will  be  projected  and  what  the 
corresponding  size  of  the  hyper-polygon  will  be.  The  larger 
the  hyper-polygon  is,  the  easier  it  will  avoid  (“roll  over”) 
the  local  minima  on  the  error  surface.  However,  it  will  be 
difficult  to  get  close  to  the  global  minimum  if  its  size  is  too 
large.  On  the  other  hand,  although  a  small  hyper-polygon 
will  allow  it  to  get  to  a  close  proximity  to  the  global  mini¬ 
mum,  it  is  more  likely  to  be  trapped  in  a  local  minimum.  The 
magnitude  of  the  coefficient  t  is  controlled  adaptively  by  the 
Nelder  and  Mead  algorithm.  In  case  a  large  reduction  in  the 
error  is  detected  for  the  new  vertex,  the  magnitude  of  t  is 
increased.  In  case  the  error  is  found  to  be  increased  for  the 
new  vertex,  the  magnitude  of  t  is  decreased. 
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The  this  paper,  the  nonlinear  simplex  optimization  by 
Nelder  and  Mead  was  used  to  adjust  the  coefficients  a,  b,  c, 
dy  e,  and  /  and  to  warp  the  fan-shaped  template,  thereby 
maximizing  the  correlation  (C)  between  the  template  and  a 
breast  structure  on  the  prior  mammogram.  Therefore,  the  di¬ 
mensionality  of  the  space  was  7:  Six  parameters  to  be  ad¬ 
justed  and  the  error  function  to  be  minimized  was  defined  as 
1-C. 
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A  nev^  classification  scheme  was  developed  to  classify  mammographic  masses  as  malignant  and 
benign  by  using  interval  change  information.  The  masses  on  both  the  current  and  the  prior  mam¬ 
mograms  were  automatically  segmented  using  an  active  contour  method.  From  each  mass,  20  run 
length  statistics  (RLS)  texture  features,  3  speculation  features,  and  12  morphological  features  were 
extracted.  Additionally,  20  difference  RLS  features  were  obtained  by  subtracting  the  prior  RLS 
features  from  the  corresponding  current  RLS  features.  The  feature  space  consisted  of  the  current 
RLS  features,  the  difference  RLS  features,  the  current  and  prior  speculation  features,  and  the 
current  and  prior  mass  sizes.  Stepwise  feature  selection  and  linear  discriminant  analysis  classifica¬ 
tion  were  used  to  select  and  merge  the  most  useful  features.  A  leave-one-case-out  resampling 
scheme  was  used  to  train  and  test  the  classifier  using  140  temporal  image  pairs  (85  malignant,  55 
benign)  obtained  from  57  biopsy-proven  masses  (33  malignant,  24  benign)  in  56  patients.  An 
average  of  10  features  were  selected  from  the  56  training  subsets:  4  difference  RLS  features,  4  RLS 
features,  and  1  speculation  feature  from  the  current  image,  and  1  speculation  feature  from  the  prior, 
were  most  often  chosen.  The  classifier  achieved  an  average  training  A  -  of  0.92  and  a  test  A-  of  0.88. 

For  comparison,  a  classifier  was  trained  and  tested  using  features  extracted  from  the  120  current 
single  images.  This  classifier  achieved  an  average  training  A-  of  0.90  and  a  test  A-  of  0.82.  The 
information  on  the  prior  image  significantly  (/?  =  0.015)  improved  the  accuracy  for  classification  of 
the  masses.  ©  2001  American  Association  of  Physicists  in  Medicine.  [DOT:  10.1118/1.1412242] 

Key  words:  computer-aided  diagnosis,  interval  change,  classification,  feature  analysis, 
mammography,  malignancy 


L  INTRODUCTION 

Mammography  is  currently  the  most  effective  method  for 
early  breast  cancer  detection.  Analysis  of  interval  changes 
is  an  important  method  used  by  radiologists  in  mammo¬ 
graphic  interpretation  to  detect  developing  malignancy. A 
variety  of  computer-aided  diagnosis  (CAD)  techniques  have 
been  developed  to  detect  abnormalities  and  to  distinguish 
malignant  and  benign  lesions  on  mammograms.  We  are 
studying  the  use  of  CAD  techniques  to  assist  radiologists  in 
interval  change  analysis. 

Commonly  used  lesion  classification  methods  for  CAD 
employ  information  from  a  single  image.  These  methods 
have  been  shown  to  perform  well  in  lesion  classification 
problems.^"^^  However,  when  mammograms  from  multiple 
examinations  are  available,  it  can  be  expected  that  even 
higher  accuracy  may  be  achieved  if  the  computer  can  utilize 
the  interval  change  information  for  classification.  New  com¬ 
puter  vision  methods  will  have  to  be  designed  to  extract 
features  characterizing  temporal  changes  and  to  improve  the 
differentiation  between  benign  and  malignant  masses. 

A  number  of  researchers  have  developed  algorithms  to 
register  the  mass  on  current  and  prior  mammograms.  Sallam 
et  alP  have  proposed  a  warping  technique  for  mammogram 
registration  based  on  manually  identified  control  points.  A 
mapping  function  was  calculated  for  matching  each  point  on 
the  current  mammogram  to  a  point  on  the  prior  mammo¬ 


gram.  Brzakovic  et  aO"^  have  investigated  a  three-step 
method  for  comparison  of  the  most  recent  and  the  prior 
mammograms.  They  first  registered  two  mammograms  using 
the  method  of  principal  axis,  and  partitioned  the  current 
mammogram  using  a  hierarchical  region-growing  technique. 
Translation,  rotation,  and  scaling  were  then  used  for  registra¬ 
tion  of  the  partitioned  regions.  Vujovic  et  ai  have  proposed 
a  multiple-control-point  technique  for  mammogram  registra¬ 
tion.  They  first  determined  several  control  points  indepen¬ 
dently  on  the  current  and  prior  mammograms  based  on  the 
intersection  points  of  prominent  anatomical  structures  in  the 
breast.  A  correspondence  between  these  control  points  was 
established  based  on  a  search  in  a  local  neighborhood  around 
the  control  point  of  interest. 

The  previous  techniques  depend  on  the  identification  of 
control  points.  Furthermore,  these  studies  aimed  at  registra¬ 
tion  without  using  the  results  for  interval  change  analysis. 

Gopal  et  al}^'^^  and  Hadjiiski  et  al}^~^^  have  developed  a 
multistage  technique  that  defines  a  transformation  to  locally 
map  the  position  of  the  mass  on  a  current  mammogram  to  a 
search  region  on  the  prior  mammogram.  A  local  search  for 
the  exact  mass  location  is  then  performed  on  the  prior  mam¬ 
mogram.  Good  et  al?^  have  developed  a  technique  that  de¬ 
fines  a  transformation  to  map  all  points  from  the  current 
mammogram  onto  a  prior  mammogram.  The  current  mam¬ 
mogram  is  then  subtracted  from  the  prior  mammogram. 


2309  Med.  Phys.  28  (11),  November  2001 


0094-2405/2001/28(1 1  )/2309/9/$1 8.00 


©  2001  Am.  Assoc.  Phys.  Med.  2309 


2310  Hadjiiski  et  al,:  Analysis  of  temporal  changes  of  mammographic  features 


2310 


Few  studies  have  been  performed  so  far  in  the  area  of 
automated  classification  of  breast  masses  based  on  the  inter¬ 
val  change  information.  Gopal  et  al?^  and  Hadjiiski 
et  have  carried  out  a  preliminary  study  of  the  classi¬ 

fication  scheme  that  combines  prior  and  current  information 
automatically  extracted  from  masses  on  prior  and  current 
mammograms,  respectively.  The  classifier  using  the  com¬ 
bined  prior  and  current  information  performed  better  than  the 
classifier  using  current  information  alone.  To  our  knowledge, 
no  other  studies  that  describe  automated  classification  of  ma¬ 
lignant  and  benign  breast  lesions  based  on  temporal  changes 
of  mammographic  features  have  been  reported. 

The  goal  of  our  research  is  to  develop  a  CAD  method  for 
automated  analysis  of  interval  changes  to  be  used  as  an  aid  to 
radiologists  for  detection  and  classification  of  malignant  and 
benign  lesions  on  mammograms.  In  this  study,  we  conducted 
a  preliminary  investigation  to  demonstrate  the  feasibility  of 
analyzing  temporal  differences  in  the  texture  and  morpho¬ 
logical  features  between  a  mass  on  the  most  recent  mammo¬ 
gram  and  a  prior  mammogram  of  the  same  view  for  the 
classification  task.  Additionally,  we  compared  this  method 
with  two  classification  methods,  one  of  which  is  based  on 
information  extracted  from  the  current  mammograms  alone, 
the  other  one  is  based  on  information  extracted  from  the 
prior  mammograms  alone. 


il.  MATERIALS  AND  METHODS 

The  new  classification  technique  is  based  on  the  design  of 
features  that  characterize  the  temporal  change  in  the  lesion  of 
interest  between  two  mammographic  examinations.  The 
mass  to  be  analyzed  can  either  be  identified  manually  by  a 
radiologist  or  automatically  by  a  computerized  detection  pro¬ 
gram.  In  this  study,  the  mass  on  each  mammogram  was  iden¬ 
tified  by  an  MQSA  certified  radiologist.  The  masses  on  both 
the  current  and  the  prior  marrrmograms  were  automatically 
segmented  using  an  active  contour  method  that  has  been  dis¬ 
cussed  in  detail  elsewhere.^®’^^  Examples  of  the  segmentation 
are  shown  in  Figs.  2  and  3  for  a  malignant  and  a  benign 
mass,  respectively.  Features  that  characterized  mammo¬ 
graphic  masses  including  texture  features,  morphological 
features,  and  spiculation  features  were  extracted  from  each 
mass.  Three  of  the  morphological  features  are  related  to  the 
mass  size.  Additionally,  difference  features  were  obtained  by 
subtracting  a  feature  of  the  prior  mass  from  the  correspond¬ 
ing  feature  of  the  current  mass.  The  current,  prior,  and  dif¬ 
ference  features  formed  a  multidimensional  feature  space  for 
the  classification  task.  Stepwise  feature  selection  applied  to 
linear  discriminant  analysis  (LDA)  was  used  to  select  the 
most  useful  features.  The  selected  features  were  then  used  as 
the  input  predictor  variables  for  the  LDA  classifier  (Fig.  1). 
The  classifier  was  trained  and  tested  by  a  leave-one-case-out 
resampling  scheme.  A  case  was  considered  to  contain  all 
regions  of  interest  from  a  given  patient.  In  each  resampling 
step,  the  temporal  pairs  from  55  cases  were  used  for  feature 
selection  and  formulation  of  the  linear  discriminant  function, 
while  the  temporal  pairs  from  the  left-out  case  were  used  for 
testing  the  trained  classifier.  A  total  of  56  training  and  testing 
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Fig.  1 .  Block  diagram  of  the  classification  method. 


steps  were  obtained  from  the  56  cases.  The  classification 
results  from  the  56  test  cases  were  accumulated  to  evaluate 
the  classifier  performance.  Since  the  data  set  in  this  study 
was  still  small,  we  chose  the  feature  selection  parameters 
such  that  the  dimensionality  of  the  input  feature  vector  for 
the  LDA  classifier  was  small  in  order  to  reduce  the  possibil¬ 
ity  of  over-training.  The  feature  selection  procedure  is  dis¬ 
cussed  in  Sec.  11 C. 

To  evaluate  the  improvement  in  the  classifier  performance 
designed  by  using  the  temporal  change  information,  two  ad¬ 
ditional  classifiers  were  obtained.  One  of  them  was  trained 
using  the  information  extracted  from  the  current  single  im¬ 
ages  of  the  temporal  pairs.  We  will  refer  to  these  images  as 
current  images.  The  other  classifier  was  trained  using  the 
information  extracted  from  the  prior  single  images  of  the 
temporal  pairs  and  we  will  refer  to  these  images  as  prior 
images.  Comparison  of  the  three  classifiers  will  reveal  the 
effectiveness  of  interval  change  analysis  for  the  classification 
of  malignant  and  benign  masses. 

A.  Data  set 

A  set  of  140  temporal  pairs  of  mammograms  containing 
biopsy-proven  masses  on  the  current  mammograms  was  used 
to  examine  the  performance  of  this  approach.  The  data  set 
consisted  of  241  mammograms  from  56  patients.  The  mam¬ 
mograms  were  digitized  with  a  LUMISC  AN  85  laser  scanner 
at  a  pixel  resolution  of  50  yu-mX  50  fxm  and  4096  gray  levels. 
The  digitizer  was  calibrated  so  that  gray  level  values  were 
linearly  proportional  to  the  optical  density  (OD)  within  the 
range  of  0-4  OD  units,  with  a  slope  of  0.001  OD/pixel 
value.  The  digitizer  output  was  linearly  converted  so  that  a 
large  pixel  value  corresponded  to  a  low  optical  density.  The 
image  matrix  size  was  reduced  by  averaging  every  2X2  ad¬ 
jacent  pixels  and  downsampled  by  a  factor  of  2,  resulting  in 
images  with  a  pixel  size  of  lOOyLtmX  100//,m  for  further 
analysis. 

There  were  57  biopsy-proven  masses  (33  malignant  and 
24  benign)  in  the  56  cases.  The  241  mammograms  contained 
different  mammographic  views  (CC,  MLO,  and  lateral 
views)  and  multiple  examinations  of  the  masses  including 
the  examination  when  the  biopsy  decision  was  made.  By 
matching  masses  of  the  same  view  from  two  different  exami¬ 
nations,  a  total  of  140  temporal  pairs  were  formed,  of  which 
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(C)  (d) 


Fig.  2.  A  malignant  mass:  (a)  the  mass  in  a  prior  year  mammogram  (1997), 
(b)  mass  outline  obtained  by  active  contour  segmentation,  (c)  the  mass  in  a 
current  year  mammogram  (1998),  (d)  mass  outline  obtained  by  active  con¬ 
tour  segmentation. 


(c)  (d) 


Fig.  3.  A  benign  mass:  (a)  the  mass  on  a  prior  year  mammogram  (1995),  (b) 
mass  outline  obtained  by  active  contour  segmentation,  (c)  the  mass  on  a 
current  year  mammogram  (1996),  (d)  mass  outline  obtained  by  active  con¬ 
tour  segmentation. 


85  were  malignant  and  55  benign.  A  malignant  temporal  pair 
consisted  of  a  biopsy-proven  malignant  mass  or  a  mass  that 
was  initially  not  recommended  for  biopsy  and  later  found  to 
be  malignant  by  biopsy  in  a  future  year.  A  similar  definition 
was  used  for  the  benign  temporal  pairs.  Within  a  pair,  the 
current  mammogram  was  defined  as  the  mammogram  with 
the  later  date,  and  the  prior  mammogram  was  defined  as  the 
one  with  the  earlier  date.  Therefore,  in  cases  with  three  con¬ 
secutive  exams,  more  than  one  temporal  pair  could  be 
formed  and  two  of  the  mammograms  could  be  called  “cur¬ 
rent.''  Among  the  140  temporal  pairs,  we  had  120  unique 
current  mammograms.  Of  the  masses  in  the  120  current 
mammograms,  70  were  malignant  and  50  benign. 

Since  all  cases  in  this  data  set  had  undergone  biopsy,  the 
benign  masses  in  this  set  could  not  be  distinguished  easily 
from  the  malignant  ones  based  on  current  mammographic 
criteria.  Changes  occurred  for  the  benign  masses  that 
prompted  the  radiologists  to  recommend  biopsy.  Examples  of 
such  cases  are  shown  in  Figs.  2  and  3.  The  malignant  mass  in 
Fig.  2  did  not  increase  in  size  but  changed  its  density.  The 
benign  mass  (Fig.  3),  on  the  other  hand,  appeared  to  have 
spicules.  For  the  malignant  masses  in  this  data  set,  the  aver¬ 
age  mass  size,  estimated  by  the  radiologist  as  the  longest 
dimension  of  the  mass  on  the  mammogram,  was  8.2  mm  on 
the  prior  mammograms  and  12.7  mm  on  the  current  mam¬ 
mograms.  The  corresponding  sizes  were  10.6  and  12.2  mm, 
respectively,  for  the  benign  masses.  As  discussed  in  Sec.  IV, 
25  of  the  masses  on  the  prior  mammograms  were  too  subtle 
for  the  radiologist  to  estimate  their  sizes.  The  average  sizes 
given  previously  were  obtained  after  excluding  all  temporal 
pairs  that  involved  these  masses. 

The  radiologist  also  rated  the  visibility  of  the  masses  on 


the  mammograms  relative  to  those  encountered  in  clinical 
practice  on  a  10-point  scale,  with  1  representing  the  most 
obvious  and  10  representing  the  most  subtle  masses.  The 
visibility  of  the  masses  on  the  current  mammogram  is  plotted 
against  those  on  the  prior  mammogram  in  Fig.  4  for  the 
malignant  and  benign  temporal  pairs.  Generally  the  malig¬ 
nant  masses  were  less  visible  on  the  prior  than  on  the  current 
mammograms  while  the  visibility  of  the  benign  masses  was 
found  to  be  more  similar  on  the  current  and  prior  mammo¬ 
grams.  The  mean  difference  in  the  visibility  rating  between 
the  prior  and  the  current  mammograms  for  the  malignant 
masses  is  2.8  compared  to  1.2  for  the  benign  masses  (p 
=0.(X)07  with  an  unpaired  t-test  between  the  malignant  and 
benign  masses).  The  correlation  coefficient  is  0.02  for  malig¬ 
nant  masses  [Fig.  4(a)]  and  0.37  for  benign  masses  [Fig. 
4(b)].  In  addition,  the  radiologist  also  estimated  the  likeli¬ 
hood  of  malignancy  of  the  current  masses  on  a  10-point  con¬ 
fidence  scale  (1 — definitely  benign  and  10 — definitely  malig¬ 
nant)  based  on  the  120  current  mammograms  alone  without 
comparison  with  the  prior  (Fig.  5).  The  temporal  pairs  had  a 
time  interval  of  6-36  months  (Fig.  6).  More  than  70%  of  the 
pairs  had  a  time  interval  of  12  months. 

B,  Feature  extraction 

A  rectangular  region  of  interest  (ROI)  was  defined  to  in¬ 
clude  the  radiologist-identified  mass  with  an  additional  sur¬ 
rounding  breast  tissue  region  of  at  least  40  pixels  wide  from 
any  point  of  the  mass  border.  A  fully  automated  method  was 
then  used  for  segmentation  of  the  mass  from  the  breast  tissue 
background  within  the  ROI.  The  masses  on  both  the  current 
and  the  prior  mammograms  were  automatically  segmented 
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Mass  Visibility  in  Current  Mammogram 


Mass  Visibility  in  Current  Mammogram 


(a) 


(b) 


Fig.  4.  Visibility  of  the  masses  on  the  current  mammogram  plotted  against 
those  on  the  prior  mammogram  for  (a)  malignant  and  (b)  benign  temporal 
pairs.  The  visibility  was  rated  on  a  10-point  discrete  scale  (1  =  most  obvious, 
10=  most  subtle).  Because  many  of  the  data  points  overlap,  we  indicate  the 
number  of  points  with  the  same  rating  by  a  number  next  to  the  symbol  (m  or 
b).  The  diagonal  line  on  the  graph  represents  the  cases  when  the  current  and 
the  prior  mass  sizes  are  identical.  The  dashed  lines  are  the  linear  regression 
lines  for  the  data  defined  by  y  =  0.038a:  +  7.86  for  (a)  and  by  y= 0.857a: 
+  1.742  for  (b).  The  correlation  coefficient  for  malignant  masses  is  0.02  and 
for  benign  masses  is  0.37. 


within  the  ROI  using  a  two-dimensional  active  contour 
method  that  was  initialized  by  K-mean  clustering.^^’^^ 

The  texture  features  used  in  this  study  were  calculated 
from  run-length  statistics  (RLS)  matrices.^^  The  RLS  matri¬ 
ces  were  computed  from  the  images  obtained  by  the  rubber 
band  straightening  transform  (RBST).^  The  REST  maps  a 
band  of  pixels  surrounding  the  mass  onto  the  Cartesian  plane 
(a  rectangular  region).  In  the  transformed  image,  the  mass 
border  appears  approximately  as  a  horizontal  edge,  and 
spiculations  appear  approximately  as  vertical  lines.  A  com¬ 
plete  description  of  the  REST  can  be  found  in  the  literature.^ 
RLS  texture  features  were  extracted  from  the  vertical  and 
horizontal  gradient  magnitude  images,  which  were  obtained 
by  filtering  the  REST  image  with  horizontally  or  vertically 
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Malignancy  Ranking 

Fig.  5.  The  distribution  of  the  malignancy  ranking  of  the  masses  in  the  120 
current  mammograms.  The  rating  was  performed  by  an  experienced  MQSA 
radiologist  (1:  definitely  benign,  10:  definitely  malignant). 


oriented  Sobel  filters  and  computing  the  absolute  gradient 
values  of  the  filtered  image.^  Five  texture  measures,  namely, 
short  run  emphasis  (SRE),  long  run  emphasis  (LRE),  gray 
level  nonuniformity  (GLN),  run  length  nonunifomuty 
(RLN),  and  run  percentage  (RP)  were  extracted  from  the 
vertical  and  horizontal  gradient  images  in  two  directions,  6 
-0°,  and  ^=90^.  Therefore,  a  total  of  20  RLS  features 
were  calculated  for  each  ROI.  The  definition  of  the  RLS 
feature  measures  can  be  found  in  the  Appendix  and  in  the 
literature.^^ 

Morphological  features  were  extracted  from  the  automati¬ 
cally  segmented  mass  shape.  Five  of  the  morphological  fea¬ 
tures  were  based  on  the  normalized  radial  length  (NRL),  de¬ 
fined  as  the  Euclidean  distance  from  the  object’s  centroid  to 
each  of  its  edge  pixels,  i.e.,  the  radial  length,  and  normalized 
relative  to  the  maximum  radial  length  for  the  object.  The 
following  five  NRL  features  were  extracted:  mean 
(NRLAVG),  standard  deviation  (NRLSD),  entropy  (NR- 
LENT),  area  ratio  (NRLAREAR),  zero  crossing  count  (NR- 
LZCC).  In  addition,  the  perimeter  (PERIM),  area  (AREA), 
circularity  (CIRC),  rectangularity  (SQR),  contrast  (CONT), 
perimeter-to-area  ratio  (CRR),  and  Fourier  descriptor  (FF) 


Temporal  difference  (month) 


Fig.  6.  Temporal  interval  between  the  current  and  the  prior  mammograms 
for  the  140  temporal  pairs  in  our  data  set. 
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features  were  extracted.  The  definitions  of  the  morphological 
features  can  be  found  in  the  literature.^^’^®  Three  of  the  mor¬ 
phological  features  (perimeter,  area,  and  perimeter-to-area 
ratio)  are  related  to  the  mass  size  and  thus  are  feature  de¬ 
scriptors  of  the  mass  size. 

A  spiculation  measure  was  defined  for  each  pixel  on  the 
mass  border  by  using  the  statistics  of  the  image  gradient 
direction  relative  to  the  normal  direction  to  the  mass  border. 
The  statistics  was  determined  in  a  90°  sector  centered  about 
the  normal  at  the  border  pixel  and  outside  of  the  mass 
border.^"^'^^  The  spiculation  measure  for  each  border  pixel 
was  normalized  to  be  between  0  and  7r/2,  with  a  value  of  tt/A 
indicating  a  random  orientation  of  image  gradients,  and 
larger  values  indicating  a  higher  likelihood  of  spiculation. 
Three  features  were  extracted  from  the  spiculation  measure. 
The  first  feature  (AVG)  was  the  average  of  the  spiculation 
measure  for  all  pixels  on  the  mass  boundary.  The  second 
feature  (PERC_ABV)  was  the  percentage  of  border  pixels 
with  a  spiculation  measure  larger  than  7r/4,  and  the  third 
feature  (AVE_ABV)  was  the  average  of  the  spiculation  mea¬ 
sure  for  those  pixels  with  a  spiculation  measure  larger  than 
7r/4. 

A  total  of  35  features  (20  RLS,  12  morphological,  and  3 
spiculation)  were  therefore  extracted  from  each  ROI.  Addi¬ 
tionally,  difference  features  were  obtained  by  subtracting  a 
prior  feature  from  the  corresponding  current  feature.  There¬ 
fore,  35  difference  features  were  derived  from  the  20  RLS, 
12  morphological,  and  3  spiculation  features. 

C.  Feature  selection 

In  order  to  reduce  the  number  of  the  features  and  to  obtain 
the  best  feature  subset  to  design  an  effective  classifier,  fea¬ 
ture  selection  with  stepwise  linear  discriminant  analysis”^ 
was  applied.  At  each  step  of  the  stepwise  selection  procedure 
one  feature  is  entered  or  removed  from  the  feature  pool  by 
analyzing  its  effect  on  the  selection  criterion.  In  this  study, 
the  Wilks’  lambda  (the  ratio  of  within-group  sum  of  squares 
to  the  total  sum  of  squares'^^)  was  used  as  a  selection  crite¬ 
rion.  The  optimization  procedure  used  a  threshold  for 
feature  entry,  a  threshold  F^ut  for  feature  removal,  and  a 
tolerance  threshold  T  for  measuring  feature  correlation  with 
the  other  features.  In  a  feature  entry  step,  the  features  not  yet 
selected  are  entered  into  the  selected  feature  pool  one  at  a 
time,  the  significance  of  the  change  in  the  Wilks’  lambda 
caused  by  this  feature  is  estimated  based  on  F  statistics.  The 
feature  with  the  highest  significance  is  entered  into  the  fea¬ 
ture  pool  if  its  significance  is  higher  than  Fj^  and  its  corre¬ 
lation  value  with  the  rest  of  the  features  in  the  pool  is  below 
T.  In  a  feature  removal  step,  the  features  that  have  already 
been  entered  in  the  selected  feature  pool  are  removed  one  at 
a  time  and  the  significance  of  the  change  in  the  Wilks' 
lambda  is  estimated.  The  feature  with  the  least  significance  is 
removed  from  the  selected  feature  pool  if  the  significance  is 
less  than  F^ui^  Since  the  appropriate  values  of  Fj^,  F^m  and 
T  are  not  known  a  priori,  we  examined  a  range  of  Fjn ,  F^^t, 
and  T  values  using  an  automated  simplex  optimization 
method.^ The  appropriate  thresholds  were  chosen  in  such 


Table  I.  Classification  results  for  the  classifier  based  on  the  temporal 
change  information,  the  classifier  based  on  current  single  image  information, 
and  the  classifier  based  on  prior  single  image  information. 


Classification 

Avg.  No.  of 
selected  features 

Training 

Test 

Test  partial 

^(0.9) 

Temporal  pairs 

10 

0.92 

0.88  ±0.03 

o 

d 

+1 

d 

Current  images 

11 

0.90 

0.82  ±0.04 

0.32±0.08 

Prior  images 

4 

0.78 

0.76  ±0.04 

0.24±0.08 

a  way  that  a  minimum  number  of  features  were  selected  to 
achieve  a  high  accuracy  of  classification  by  LDA.  More  de¬ 
tails  about  the  stepwise  linear  discriminant  analysis  and  its 
application  to  CAD  can  be  found  elsewhere.^’^ 

The  feature  selection  in  this  study  was  performed  by  ap¬ 
plying  the  stepwise  feature  selection  to  the  entire  feature 
space  (combination  of  texture,  spiculation,  and  morphologi¬ 
cal  features  altogether)  as  well  as  subspaces  obtained  by  dif¬ 
ferent  combinations  of  the  three  feature  subspaces:  texture, 
spiculation,  and  morphological  features.  The  stepwise  feature 
selection  uses  a  sequential  forward  inclusion  and  backward 
elimination  approach.  The  procedure  does  not  exhaustively 
evaluate  all  possible  combinations  of  individual  features.  It  is 
therefore  not  optimal,  especially  when  the  feature  space  is 
large  and  the  training  sample  is  small.  By  limiting  the  input 
to  the  feature  subspaces,  the  dimensionality  was  reduced 
compared  to  the  entire  feature  space.  We  found  that  better 
feature  subsets  could  be  selected  by  the  stepwise  feature  se¬ 
lection  in  the  subspaces  than  in  the  entire  feature  space. 

D.  Evaluation  methods 

To  evaluate  the  classifier  performance,  the  training  and 
test  discriminant  scores  were  analyzed  using  receiver  operat¬ 
ing  characteristic  (ROC)  methodology.^^  The  discriminant 
scores  of  the  malignant  and  benign  masses  were  used  as 
decision  variables  in  the  LABROCl  program, which  fits  a 
binormal  ROC  curve  based  on  maximum  likelihood  estima¬ 
tion.  The  classification  accuracy  was  evaluated  as  the  area 
under  the  ROC  curve.  A  - .  The  performances  of  the  classifi¬ 
ers  were  also  assessed  by  estimating  the  partial  area  index 
(A^^*^^).  The  partial  area  index  (A^^’^^)  is  defined  as  the  area 
that  lies  under  the  ROC  curve  but  above  a  sensitivity  thresh¬ 
old  of  0.9  (TPFo=0.9)  normalized  to  the  total  area  above 
TPFq,  (1“TPFo).  The  partial  indicates  the  perfor¬ 

mance  of  the  classifier  in  the  high  sensitivity  (low  false  nega¬ 
tive)  region  which  is  most  important  for  a  cancer  detection 
task. 

III.  RESULTS 

The  performances  of  the  classifiers  based  on  the  temporal 
pairs,  the  current  images,  and  the  prior  images  are  summa¬ 
rized  in  Table  I.  The  classifiers  that  achieved  the  highest  test 
A-  values  with  a  small  average  number  of  features  were  pre¬ 
sented  here.  Table  II  is  a  summary  of  the  features  selected  for 
each  classifier.  For  the  56  training  subsets  of  temporal  pairs 
used  in  this  study,  an  average  of  10  features  were  selected  for 
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Table  H.  Selected  features  for  classifiers  based  on  temporal  pairs,  current  images,  and  prior  images.  The  letter 
“H”  or  “V”  at  the  h-ginning  of  the  texture  feature  labels  indicates  that  the  features  were  extracted  from  the 
horizontal  or  vertical  gradient  magnitude  images,  respectively.  The  number  (0  or  90)  at  the  end  of  the  texture 
feature  labels  shows  the  direction  at  which  the  features  were  extracted. 


Feature  type 

Group 

Features 

Temporal  pairs 

Current 

images 

Curr 

Prior 

images 

Pr 

Curr  Pr 

Diff 

Texture 

SRE 

H  SRE  0 

X 

H  SRE  90 

X 

X 

V  SRE  0 

X 

X 

X 

X 

V  SRE_  90 

X 

LRE 

V  LRE  0 

X 

X 

H  LRE  0 

X 

RLN 

V  RLN  0 

X 

X 

RP 

H^RP_0 

X 

X 

Spiculation 

PERC_ABV 

X 

X 

AVG 

X 

AVG_ABV 

X 

Morphological 

CRR 

X 

NRLZCC 

X 

PERIM 

X 

NRLAVG 

X 

SQR 

X 

CONT 

X 

the  classification  task.  The  most  frequently  selected  features 
included  4  difference  RLS  features  (3  SRE  and  1  LRE),  4 
RLS  features  (2  SRE,  1  RLN  and  1  RP),  1  spiculation  feature 
from  the  current  image,  and  1  spiculation  feature  from  the 
prior  image  (Table  II).  The  LDA  classifier  achieved  an  aver¬ 
age  training  of  0.92  and  a  test  Aj  of  0.88.  The  test  partial 

A  (0.9)  Q  37 

For  classification  of  malignant  and  benign  masses  using 
the  current  single  images  (the  current  images  of  the  temporal 
pairs),  the  LDA  classifier  selected  an  average  of  11  features 
for  the  56  training  subsets.  The  most  frequently  selected  fea¬ 
tures  were  4  RLS  features  (2  SRE,  1  LRE  and  1  RLN),  1 
spiculation  feature,  and  6  morphological  features  (Table  II). 
The  classifier  achieved  an  average  training  A.  of  0.90,  a  test 
A.  of  0.82,  and  a  test  partial  aI*”’  of  0.32. 

For  the  classification  of  masses  based  on  the  prior  single 
images  alone,  an  average  of  4  features  were  selected  for  the 
56  training  subsets.  The  most  frequently  selected  features 
were  3  RLS  features  (1  SRE,  1  LRE,  and  1  RP)  and  1  spicu¬ 
lation  feature.  The  LDA  classifier  achieved  an  average  train¬ 
ing  Aj  of  0.78,  test  A^  of  0.76,  and  test  partial  of  0.24. 

The  test  ROC  curves  for  the  three  classifiers  are  compared 
in  Fig.  7.  The  difference  in  the  test  A^  between  the  classifier 
based  on  the  temporal  pairs  and  that  based  on  the  current 
images  alone  is  statistically  significant  (p=0.015).  The  dif¬ 
ference  in  the  test  A^  between  the  classifier  based  on  the 
temporal  pairs  and  that  based  on  the  prior  images  alone  is 
also  statistically  significant  (p =0.001).  The  partial  area  in¬ 
dex  for  the  classifier  based  on  the  temporal  pairs  is  also 
improved  compared  to  the  classifiers  based  on  the  current  or 
the  prior  images  alone,  although  the  differences  did  not 
achieve  statistical  significance. 


IV.  DISCUSSION 

Texture  and  spiculation  features  were  important  for  ma¬ 
lignant  and  benign  classification  of  mammographic  masses 
for  all  three  types  of  classifiers:  the  classifier  based  on  tem¬ 
poral  pair  information,  the  classifier  based  on  current  image 
information,  and  the  classifier  based  on  prior  image  informa¬ 
tion.  One  or  more  of  the  spiculation  features  were  always 
selected  in  all  training  partitions  for  all  three  classifiers.  The 
most  frequently  selected  texture  features  were  the  short  run 
emphasis  (SRE)  features.  They  comprised  more  than  50%  of 
the  texture  features  selected  for  the  three  classifiers  (Table 

The  temporal-information-based  classifier  showed  im¬ 
proved  performance  compared  to  the  classifiers  based  on  cur- 


False  Positive  Fraction 

Fig.  7.  The  test  ROC  curves  for  the  classifiers  based  on  temporal  pair 
information,  current  image  information,  and  prior  image  information. 
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rent  or  prior  image  information  alone.  The  input  feature 
space  to  the  temporal-information-based  classifiers  included 
the  current,  prior,  and  difference  features.  This  allows  the 
classifier  to  choose  the  individual  features  or  the  difference 
features.  Using  the  stepwise  feature  selection  procedure  and 
the  linear  discriminant  classifier,  it  was  found  that  the  texture 
and  the  spiculation  features  contained  useful  temporal  infor¬ 
mation  to  perform  malignant  and  benign  mass  classification. 
Texture  features  appeared  to  provide  the  best  information  by 
the  difference  features  obtained  from  subtracting  the  prior 
from  the  corresponding  current  features  (SRE  and  LRE  dif¬ 
ference  features).  On  the  other  hand,  the  best  use  of  the 
spiculation  features  appeared  to  be  a  direct  combination  of 
current  and  prior  features  in  the  input  feature  vector  by  the 
LDA  since  the  individual  features  were  chosen. 

We  found  that  better  feature  subsets  could  be  selected  by 
the  stepwise  feature  selection  in  the  subspaces  than  in  the 
entire  feature  space.  For  example,  for  the  temporal- 
information-based  classifier,  a  better  feature  subset  with  a 
higher  test  A,  at  0.88  was  found  when  the  input  feature  space 
included  only  the  texture  and  spiculation  subspaces.  The  ad¬ 
dition  of  the  morphological  feature  subspace  to  the  input 
feature  space  reduced  the  highest  test  A-  to  0.84.  Similarly, 
in  the  case  of  the  classifier  based  on  prior  image  information, 
a  better  feature  subset  was  obtained  when  the  texture  and 
spiculation  feature  subspaces  were  used  in  the  input  feature 
space  for  stepwise  feature  selection.  Again  the  addition  of 
the  morphological  feature  subspace  to  the  input  feature  space 
reduced  the  highest  test  A.  to  0.72.  The  classifier  based  on 
current  image  information  was  the  only  one,  among  the 
three,  that  obtained  a  better  result,  as  shown  in  Table  I,  when 
the  morphological  feature  subspace  was  included  in  the  input 
feature  space. 

One  reason  for  the  poor  performance  of  the  morphologi¬ 
cal  features  may  be  due  to  the  fact  that  the  masses  were  more 
subtle  in  the  prior  images.  In  fact,  the  experienced  MQSA 
mammographer  was  not  confident  in  seeing  25  of  the 
“masses’’  on  the  prior  images  and  could  not  provide  a  mass 
size  estimation  for  them.  Although  the  active  contour  model 
would  stop  the  iteration  based  on  the  preset  criteria  and 
found  an  “outline”  of  the  masses  on  the  prior  mammograms, 
generally  these  mass  outlines  were  less  reliable  than  those  on 
the  current  masses  in  providing  morphological  characteris¬ 
tics  of  the  masses.  Texture  features  did  not  depend  as 
strongly  on  the  precise  mass  boundary  as  morphological  fea¬ 
tures.  Three  out  of  the  four  features  selected  for  classification 
of  the  malignant  and  benign  masses  on  the  prior  images  were 
RLS  texture  features.  A  spiculation  feature  was  also  found  to 
be  a  good  discriminator. 

We  also  performed  ROC  analysis  of  the  malignancy  con¬ 
fidence  ratings  provided  by  the  experienced  MQSA  radiolo¬ 
gist  for  the  current  image  data  set  (120  images).  The  distri¬ 
bution  of  the  malignancy  ratings  is  shown  in  Fig.  5,  which 
resulted  in  an  A-  value  of  0.80±0.04,  This  indicates  that  the 
masses  in  the  current  mammograms  cannot  be  easily  distin¬ 
guished  as  malignant  or  benign  even  by  an  experienced  ra¬ 
diologist,  consistent  with  the  fact  that  all  lesions  had  indeed 
undergone  biopsy.  The  classifier  based  on  the  current  image 


information  has  an  A ^  value  of  0.82±0.04,  similar  to  the 
accuracy  of  the  radiologist  for  this  data  set. 

In  this  study,  the  locations  of  the  masses  were  identified 
manually  on  both  the  current  and  the  prior  mammograms  by 
a  radiologist.  This  simulated  the  situation  when  a  radiologist 
finds  a  mass  either  in  a  diagnostic  or  a  screening  setting  and 
call  upon  the  CAD  algorithm  to  seek  a  second  opinion  on  the 
likelihood  of  malignancy  of  the  mass  based  on  the  interval 
change  information.  We  are  developing  an  automated  re¬ 
gional  registration  technique  that  can  automatically  locate 
the  mass  on  the  prior  mammogram  based  on  its  location  on 
the  current  mammogram.  The  location  of  the  mass  on  the 
current  mammogram  can  be  identified  by  a  radiologist  or  by 
an  automated  mass  detection  algorithm.  In  the  latter  case,  the 
process  of  mass  detection,  current  and  prior  mass  registra¬ 
tion,  and  classification  can  be  fully  automated.  The  analysis 
of  interval  change  can  be  incorporated  as  one  of  the  func¬ 
tions  provided  by  a  CAD  system  for  interpretation  of  mam¬ 
mograms. 

In  this  study,  we  employed  a  simple  measure  of  temporal 
change  by  taking  the  difference  between  the  feature  from  the 
current  mass  and  the  corresponding  feature  from  the  prior 
mass.  We  observed  improvement  in  classification  with  this 
simple  temporal  information.  It  will  be  important  to  evaluate 
other  similarity  measures  that  can  characterize  small  differ¬ 
ence  in  image  features  of  the  object  of  interest.  It  can  be 
expected  that  a  more  sensitive  similarity  measure  will  pro¬ 
vide  a  better  measurement  of  dissimilarity,  or  difference,  be¬ 
tween  the  current  and  prior  masses  and  further  improve  the 
utilization  of  the  temporal  change  information  on  mammo¬ 
grams. 


V.  CONCLUSION 

We  performed  a  preliminary  study  to  evaluate  the  effec¬ 
tiveness  of  interval  change  analysis  for  classification  of  ma¬ 
lignant  and  benign  masses  on  mammograms.  It  was  found 
that  the  difference  RLS  texture  features  and  spiculation  fea¬ 
tures  were  useful  for  identification  of  malignancy  in  tempo¬ 
ral  pairs  of  mammograms.  The  information  on  the  prior  im¬ 
age  was  important  for  characterization  of  the  masses;  5  out 
of  the  10  selected  features  contained  prior  information.  We 
found  that  the  mass  size  descriptors  were  not  discriminatory 
features  for  these  difficult  cases  because  many  of  the  benign 
masses  also  grew  over  time.  In  comparison  with  the  classi¬ 
fication  based  on  image  information  from  the  current  images 
alone,  the  temporal  change  information  significantly  {p 
=  0.015)  improved  the  accuracy  for  classification  of  the 
masses  in  terms  of  the  total  area  under  the  ROC  curve  (A.). 
The  partial  area  under  the  ROC  curve  for  the  classifier  based 
on  the  temporal  pairs  (A  ^^’^^  =  0.37)  is  also  improved  com¬ 
pared  to  the  classifier  based  only  on  the  current  images 
(A ^0-9)  32)^  although  the  difference  did  not  achieve  statis¬ 

tical  significance.  Further  studies  are  under  way  to  improve 
this  temporal  change  classification  technique  and  to  evaluate 
its  performance  on  a  larger  data  set. 


Medical  Physics,  Vol.  28,  No.  11,  November  2001 


2316 


Hadjiiski  et  al.:  Analysis  of  temporal  changes  of  mammographic  features 


2316 


ACKNOWLEDGMENTS 

This  work  is  supported  by  a  Career  Development  Award 
from  the  USAMRMC  (No.  DAMD  17-98-1-8211)  (L.H.), 
USPHS  Giant  No.  CA  48129,  and  a  USAMRMC  grant  (No. 
DAMD  17-96-1-6254).  The  content  of  this  publication  does 
not  necessarily  reflect  the  position  of  the  government  and  no 
official  endorsement  of  any  equipment  and  product  of  any 
companies  mentioned  in  the  publication  should  be  inferred. 
The  authors  are  grateful  to  Charles  E.  Metz,  Ph.D.,  for  the 
LABROC  program. 


Run  percentage  is  defined  as 

This  feature  is  a  ratio  of  the  total  number  of  runs  to  the  total 
number  of  possible  runs  (P)  if  all  runs  have  a  length  of  one. 

The  above-given  definitions  are  based  on  Galloway^’  and 
more  details  can  be  found  in  this  reference. 


APPENDIX:  RUN  LENGTH  STATISTICS  TEXTURE 
FEATURES 

A  gray  level  run  length  is  a  set  of  consecutive  collinear 
pixels  all  having  the  same  gray  level  value.  The  length  of  the 
run  is  the  number  of  pixels  in  the  run.  For  a  given  image  it  is 
possible  to  compute  a  gray  level  run  length  matrix  for  runs  in 
any  given  direction.  In  this  study,  two  directions  are  used; 
9=0°,  and  9=90°.  Let  be  the  number  of  times  there 
is  a  ran  of  length  j  that  has  a  gray  level  i.  Let  Ng  be  the 
number  of  gray  levels  and  be  the  number  of  runs.  The 
short  ran  emphasis  is  defined  as 


This  feature  divides  the  frequency  of  each  ran  length  by 
the  length  of  the  ran  squared.  This  tends  to  emphasize  short 
runs.  The  denominator  is  the  total  number  of  runs  in  the 
image  and  serves  as  a  normalizing  factor.  The  long  ran  em¬ 
phasis  is  defined  as 


LRE= — - -TTT' 


This  feature  multiplies  the  frequency  of  each  run  length  by 
the  length  of  the  ran  squared.  This  tends  to  emphasize  long 
runs. 

The  gray  level  nonuniformity  is  defined  as 


GLN= 


S^,(Sfi,p(/J))^ 

2fi,2;:,p(/,;)“‘ 


This  feature  squares  the  number  of  ran  lengths  for  each  gray 
level.  This  measures  the  gray  level  nonuniformity  of  the  im¬ 
age.  If  the  runs  are  equally  distributed  over  all  gray  levels, 
the  feature  takes  on  its  lowest  values.  A  larger  ran  length 
contributes  more  to  the  feature  value. 

Run  length  nonuniformity  is  defined  as 


2j':,(2f:,p(/,7)) 
RLN=-^ 


This  feature  measures  the  nonuniformity  of  the  ran  lengths. 
If  the  runs  are  equally  distributed  over  all  lengths,  the  feature 
will  have  a  low  value.  A  larger  run  contour  contributes  more 
to  the  feature  value. 
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ABSTRACT 

A  new  classification  scheme  was  developed  to  classify  mammographic  masses  as  malignant  and  benign  by  using  interval 
change  information.  The  masses  on  both  the  current  and  the  prior  mammograms  were  automatically  segmented  using  an 
active  contour  method.  From  each  mass,  20  run  length  statistics  (RLS)  texture  features,  3  spiculation  features,  and  mass  size 
were  extracted.  Additionally,  20  difference  RLS  features  were  obtained  by  subtracting  the  prior  RLS  features  from  the 
corresponding  current  RLS  features.  The  feature  space  consisted  of  the  current  RLS  features,  the  difference  RLS  features,  the 
current  and  prior  spiculation  features,  and  the  current  and  prior  mass  sizes.  Stepwise  feature  selection  and  linear  discriminant 
analysis  classification  (LDA)  were  used  to  select  and  merge  the  most  useful  features.  A  leave-one-case-out  resampling 
scheme  was  applied  to  train  and  test  the  classifier  using  140  temporal  image  pairs  (85  malignant,  55  benign)  obtained  from  57 
biopsy-proven  masses  (33  malignant,  24  benign)  in  56  patients.  An  average  of  10  features  were  selected  from  the  56  training 
subsets:  4  difference  RLS  features,  4  RLS  features  and  1  spiculation  feature  from  the  current  image,  and  1  spiculation  feature 
from  the  prior,  were  most  often  chosen.  The  classifier  achieved  an  average  training  A^  of  0.92  and  a  test  A^  of  0.88.  For 
comparison,  a  classifier  was  trained  and  tested  using  features  extracted  from  the  120  current  single  images.  This  classifier 
achieved  an  average  training  of  0.90  and  a  test  A^  of  0.82.  The  information  on  the  prior  image  significantly  (p=0.01) 
improved  the  accuracy  for  classification  of  the  masses. 

Keywords:  Computer-Aided  Diagnosis,  Interval  Changes,  Classification,  Feature  analysis.  Mammography,  Malignancy. 

1.  INTRODUCTION 

Mammography  is  currently  the  most  effective  method  for  early  breast  cancer  detection*’^.  Analysis  of  interval  changes  is 
an  important  method  used  by  radiologists  in  mammographic  interpretation  to  detect  developing  malignancy^*"^.  A  variety  of 
computer-aided  diagnosis  (CAD)  techniques  have  been  developed  to  detect  mammographic  abnormalities  and  to  distinguish 
between  malignant  and  benign  lesions.  We  are  studying  the  use  of  CAD  techniques  to  assist  radiologists  in  interval  change 
analysis. 

Commonly  used  classification  methods  for  CAD  use  information  from  a  single  image.  These  methods  have  been  shown 
to  perform  well  in  lesion  classification  problems^' However,  when  multiple-year  mammograms  of  a  mass  are  available,  it  is 
not  trivial  to  design  computer  vision  methods  to  use  the  temporal  information  for  computer-aided  classification  and  to 
improve  the  differentiation  between  benign  and  malignant  masses. 

The  goal  of  our  research  is  to  develop  a  technique  for  computerized  analysis  of  temporal  differences  between  a  lesion  on 
the  most  recent  mammogram  and  a  prior  mammogram  of  the  same  view.  The  computer  algorithm  can  be  used  to  assist 
radiologists  in  evaluating  interval  changes  and  thus  distinguishing  between  malignant  and  benign  masses  for  CAD.  We  have 
previously  presented^  preliminary  results  that  demonstrated  the  feasibility  of  classifying  malignant  and  benign  masses  based 
on  interval  change  analysis.  In  this  study,  we  continue  the  development  of  this  approach.  Additionally,  we  compared  this 
method  with  a  classification  method  based  on  information  extracted  from  the  current  mammogram  alone. 

2.  CLASSIFICATION  TECHNIQUE 

A  new  classification  scheme  was  developed  to  classify  mammographic  masses  as  malignant  and  benign  by  using  interval 
change  information.  The  technique  is  based  on  the  generation  of  features  that  we  expect  will  represent  adequately  the 
temporal  information  and  will  discriminate  between  malignant  and  benign  masses. 
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Figure  1.  Block-diagram  of  the  classification  method. 


The  mass  to  be  analyzed  can  either  be  identified  manually  by  a  radiologist  or  automatically  by  a  computerized  detection 
program.  In  this  study,  the  masses  were  identified  by  an  MQSA  radiologist  on  each  mammogram.  The  masses  on  both  the 
current  and  the  prior  mammograms  were  automatically  segmented  using  an  active  contour  method.  An  example  of  the 
segmentation  is  shown  in  Figure  2  and  Figure  3  for  a  malignant  and  a  benign  mass,  respectively.  Features  such  as  texture 
features,  spiculation  features  and  mass  size  were  extracted  from  each  mass.  Additionally,  difference  features  were  obtained 
by  subtracting  a  prior  feature  from  the  corresponding  current  feature.  The  feature  space  consisted  of  current,  prior,  and 
difference  features.  Stepwise  feature  selection  applied  to  linear  discriminant  analysis  (LDA)  were  used  to  select  the  most 
useful  features.  The  selected  features  were  then  used  as  the  input  predictor  variables  of  the  LDA  classifier  (Figure  1).  A 
leave-one-case-out  resampling  scheme  was  employed  to  train  and  test  the  classifier.  The  LDA  classifier  was  used  in  order  to 
keep  the  discrimination  function  simple,  thereby  reducing  the  possibility  of  over-training. 

To  evaluate  the  improvement  in  the  classifier  performance  designed  by  using  the  temporal  change  information,  an 
additional  classifier  was  trained  using  the  information  extracted  from  the  current  single  images  of  the  temporal  pairs.  We  will 
refer  to  these  images  as  current  images.  Comparison  of  the  two  classifiers  will  reveal  the  effectiveness  of  interval  change 
analysis  on  classification  of  malignant  and  benign  masses. 

3.  DATA  SET 

A  set  of  140  temporal  pairs  of  mammograms  containing  biopsy-proven  masses  on  the  current  mammograms  was  used  to 
examine  the  performance  of  this  approach.  The  data  set  consisted  of  a  total  of  241  mammograms  from  56  patients.  The 
mammograms  were  digitized  with  a  LUMISCAN  85  laser  scanner  at  a  pixel  resolution  of  50  JMn  X  50  jJm  and  4096  gray 
levels.  The  digitizer  was  calibrated  so  that  gray  level  values  were  linearly  proportional  to  the  optical  density  (OD)  within  the 
range  of  0  to  4  OD  units,  with  a  slope  of  0.001  OD/pixel  value.  Outside  this  range,  the  slope  of  the  calibration  curve 
decreased  gradually.  The  digitizer  output  was  linearly  converted  so  that  a  large  pixel  value  corresponded  to  a  low  optical 
density.  The  images  were  averaged  and  down-sampled  by  a  factor  of  2  resulting  in  images  with  a  pixel  size  of  100/im  X 

100  fMn  for  further  analysis. 

The  56  cases  contained  57  biopsy  proven  masses  (33  malignant  and  24  benign).  The  241  mammograms  contained  different 
mammographic  views  and  multiple  years  of  the  masses  including  the  year  when  the  biopsy  was  performed.  By  matching 
masses  of  the  same  view  from  two  different  exams,  a  total  of  140  temporal  pairs  were  formed,  of  which  85  were  malignant 
and  55  benign.  A  malignant  temporal  pair  consisted  of  a  biopsy  proven  malignant  mass  or  a  mass  that  was  followed  up  and 
found  to  be  malignant  by  biopsy  in  a  future  year.  Similar  definitions  were  used  for  the  benign  temporal  pairs.  Within  the  140 
temporal  pairs,  a  total  of  120  mammograms  were  current  mammograms.  Of  the  120  current  mammograms,  70  were 
malignant  and  50  benign. 
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Figure  2.  A  malignant  mass:  (a)  the  mass  in  a  prior  year  mammogram  (1997),  (b)  mass  outline 
obtained  by  active  contour  segmentation,  (c)  the  mass  in  a  current  year  mammogram 
(1998),  (d)  mass  outline  obtained  by  active  contour  segmentation. 


Since  all  cases  in  this  data  set  had  undergone  biopsy,  the  benign  masses  in  this  set  could  not  be  distinguished  easily  from 
the  malignant  ones  based  on  current  mammographic  criteria.  Examples  of  such  cases  are  shown  in  Figure  2  and  Figure  3. 
The  malignant  mass  in  Figure  2  did  not  increase  in  size  but  changed  its  density.  The  benign  mass  (Figure  3),  on  the  other 
hand,  appeared  to  have  spicules.  For  the  malignant  masses  in  this  data  set,  the  average  mass  size  was  8.2  mm  on  the  prior 
mammograms  and  12.7  mm  on  the  current  mammograms.  The  corresponding  sizes  were  10.6  mm  and  12.2  mm,  respectively, 
for  the  benign  masses.  The  temporal  pairs  had  a  time  interval  of  6  to  36  months.  More  than  70%  of  the  pairs  had  a  time 
interval  of  12  months. 


(c)  (d) 


Figure  3.  A  benign  mass:  (a)  the  mass  on  a  prior  year  mammogram  (1995),  (b)  mass  outline 
obtained  by  active  contour  segmentation,  (c)  the  mass  on  a  current  year  mammogram 
(1996),  (d)  mass  outline  obtained  by  active  contour  segmentation. 
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4.  FEATURE  EXTRACTION 


♦ 


A  rectangular  region  of  interest  (ROI)  was  defined  to  include  the  radiologist-identified  mass  with  an  additional 
surrounding  breast  tissue  region  of  at  least  40  pixels  wide  from  any  point  of  the  mass  border.  A  fully  automated  method  was 
then  used  for  segmentation  of  the  mass  from  the  breast  tissue  background  within  the  ROI.  The  masses  on  both  the  current  and 
the  prior  mammograms  were  automatically  segmented  using  a  2D  active  contour  method,  initialized  by  adaptive 
thresholding‘''•'^ 

The  texture  features  used  in  this  study  were  calculated  from  run-length  statistics  (RLS)  matrices'®.  The  RLS  matrices 
were  computed  from  the  images  obtained  by  the  rubber  band  straightening  transform  (RBST)^.  The  RBST  maps  a  band  of 
pixels  surrounding  the  mass  onto  the  Cartesian  plane  (a  rectangular  region).  In  the  transformed  image,  the  mass  border 
appears  approximately  as  a  horizontal  edge,  and  spiculations  appear  approximately  as  vertical  lines.  A  complete  description 
of  the  RBST  can  be  found  in  the  literature'. 

RLS  texture  features  were  extracted  from  the  vertical  and  horizontal  gradient  magnitude  images,  which  were  obtained  by 
filtering  the  RBST  image  with  horizontally  or  vertically  oriented  Sobel  filters  and  computing  the  absolute  gradient  value  of 
the  filtered  image.  Five  texture  measures,  namely,  short  run  emphasis,  long  run  emphasis,  gray  level  nonuniformity,  run 
length  nonuniformity ,^and  run  percentage  were  extracted  from  the  vertical  and  horizontal  gradient  images  in  two  directions, 
6=0  ,  and  6  =90  .  Therefore,  a  total  of  20  RLS  features  were  calculated  for  each  ROI.  The  definition  of  the  RLS 
feature  measures  can  be  found  in  the  literature'®. 

The  morphological  features  were  extracted  from  the  automatically  segmented  mass  shape,  and  included  features  such  as 
the  area,  circularity,  rectangularity,  compactness,  and  the  axis  ratio'®.  Spiculation  features  were  extracted  by  using  the 
statistics  of  the  image  gradient  direction  relative  to  the  normal  direction  to  the  mass  border  in  a  ring  of  pixels  surrounding  the 
mass 

A  total  of  35  features  (20  RLS,  12  morphological  and  3  spiculation)  were  therefore  extracted  from  each  ROI. 
Additionally,  difference  features  were  obtained  by  subtracting  a  prior  feature  from  the  corresponding  current  feature. 
Therefore  20  RLS,  12  morphological  and  3  spiculation  difference  features  were  obtained. 

5.  FEATURE  SELECTION 

In  order  to  reduce  the  number  of  the  features  and  to  obtain  the  best  feature  subset  to  design  an  effective  classifier,  feature 
selection  with  stepwise  linear  discriminant  analysis'*’^"  was  applied.  At  each  step  of  the  stepwise  selection  procedure  one 
feature  is  entered  or  removed  from  the  feature  pool  based  on  analysis  of  its  effect  on  the  selection  criterion.  The  stepwise 
selection  procedure  is  controlled  by  a  simplex  optimization  method'®'  in  such  a  way  that  a  minimum  number  of  features 
were  selected  to  achieve  a  high  accuracy  of  classification  by  LDA.  More  details  about  the  stepwise  linear  discriminant 
analysis  and  its  application  to  CAD  can  be  found  elsewhere®’’. 

6.  EVALUATION  METHODS 

To  evaluate  the  classifier  performance,  the  training  and  test  discriminant  scores  were  analyzed  using  receiver  operating 
characteristic  (ROC)  methodology’'.  The  discriminant  scores  of  the  malignant  and  benign  masses  were  used  as  decision 
variables  in  the  LABROCl  program”,  which  fits  a  binormal  ROC  curve  based  on  maximum  likelihood  estimation.  The 
classification  accuracy  was  evaluated  as  the  area  under  the  ROC  curve,  A^.  The  performances  of  the  classifiers  were  also 
assessed  by  estimation  of  the  partial  area  index  (A^'®  ’').  The  partial  area  index  is  defined  as  the  area  that  lies  under  the 

ROC  curve  but  above  a  sensitivity  threshold  of  0.9  (TPFq  =  0.9)  normalized  to  the  total  area  above  TPFq,  (1-TPFo).  The 
partial  Aj  indicates  the  performance  of  the  classifier  in  the  high  sensitivity  (low  false  negative)  region  which  is  most 
important  for  a  cancer  detection  task. 

7.  CLASSIFICATION  RESULTS 

For  the  data  set  used  in  this  study,  an  average  of  10  features  were  selected  from  the  56  training  subsets.  The  most 
frequently  selected  features  included  4  difference  RLS  features,  4  RLS  features  and  1  spiculation  feature  from  the  current 
image,  and  1  spiculation  feature  from  the  prior.  The  LDA  classifier  achieved  an  average  training  A^  of  0.92  and  a  test  Aj  of 
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0  88  The  LDA  classifier  using  features  extracted  from  the  current  single  images  (the  current  images  of  the  temporal  pairs) 
achieved  an  average  training  A,  of  0.90  and  a  test  A,  of  0.82.  An  average  of  11  features  were  selected  from  the  56  training 
subsets.  The  most  frequently  selected  features  were  4  RLS  features,  1  spiculation  feature  from  the  current  image  and  6 
morphological  features.  The  difference  in  the  test  A^  between  the  two  classifiers  is  statistically  significant  (p=0.01).  The 
classifier  based  on  temporal  pairs  achieved  a  test  partial  of  0.37  and  the  classifier  based  on  current  images  achieved  a 
test  Az*®  '*'  of  0.32.  These  results  are  summarized  in  Table  1. 


Table  1.  Classification  results  for  the  classifier  based  on  the  temporal  change  information  and  the  classifier 


based  on  current  single  image  information. 


Classification 

Avg.  no.  of  selected  features 

Training  A2 

Test  Az 

Test  partial  Az^®  ’* 

Temporal  pairs 

10 

0.92 

0.88  ±  0.028 

0.37  ±0.1 

Current  images 

11 

0.90 

0.82  ±  0.038 

0.32  ±  0.08 

8.  CONCLUSION 

The  difference  RLS  texture  features  and  spiculation  features  were  useful  for  identification  of  malignancy  in  temporal 
pairs  of  mammograms.  The  information  on  the  prior  image  was  important  for  characterization  of  Ae  masses;  5  out  of  the  10 
selected  features  contained  prior  information.  We  found  that  the  size  of  the  mass  was  not  a  discriminatory  feature  for  these 
difficult  cases  because  many  of  the  benign  masses  also  grew  over  time.  The  temporal  change  information  significantly 
(p=0.01)  improved  the  accuracy  for  classification  of  the  masses  in  terms  of  the  total  area  under  the  ROC  curve^(Az).  The 
partial  area  under  the  ROC  curve  is  also  improved  for  the  classifier  based  on  current  and  prior  images  (Aj  =  0-37) 
compared  to  the  classifier  based  only  on  the  current  images  (Aj*®'®’  =  0.32),  although  the  difference  did  not  achieve  statistical 
significance.  Further  studies  are  underway  to  improve  this  temporal  change  classification  technique  and  to  evaluate  its 
performance  on  a  larger  data  set. 
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An  Adaptive  Similarity  Measure  for  Automated  Identification  of  Breast  Lesions  in 
Temporal  Pairs  of  Mammograms  for  Interval  Change  Analysis 

Lubomir  Hadjiiski,  Berkman  Sahiner,  Heang-Ping  Chan,  Nicholas  Petrick,  Mark  A. 

Helvie 

PURPOSE:  An  adaptive  similarity  measure  (ASM)  is  designed  to  improve 
automated  identification  of  corresponding  lesions  on  prior  mammograms.  This 
technique  is  the  basis  for  interval  change  analysis  of  breast  lesions  in  CAD 
applications. 

MATERIALS  AND  METHODS:  A  new  class  of  similarity  measures  (SM)  is 
proposed.  It  combines  adaptive  filtering  to  enhance  the  lesion  and  a  SM  as  a  figure- 
of-merit  (FOM)  measure.  The  filters  are  designed  with  a  training  set  to  maximize  and 
minimize  the  FOM  for  the  similar  and  dissimilar  image  pairs,  respectively,  by  using  a 
gradient  optimization  technique. 

The  ASM  was  applied  to  the  final  stage  of  our  multistage  regional  registration 
technique  for  mass  identification  on  the  prior  mammogram.  A  search  for  the  best 
match  between  the  lesion  template  from  the  current  mammogram  and  a  structure  on 
the  prior  mammogram  was  carried  out  within  a  search  region,  guided  by  the  ASM. 
This  new  approach  was  evaluated  by  using  179  temporal  pairs  of  mammograms 
containing  biopsy-proven  masses. 

RESULTS:  86%  of  the  estimated  lesion  locations  resulted  in  an  area  overlap  of  at 
least  50%  with  the  true  lesion  locations.  The  average  distance  between  the  estimated 
and  the  true  lesion  centroids  on  the  prior  mammogram  was  4.5  ±  6.7  mm.  In 
comparison,  the  correct  localization  and  the  average  distance  using  a  conventional 
correlation  SM  were  84%  and  4.9  ±  7.0  mm,  respectively. 

CONCLUSION:  The  ASM  improved  the  identification  of  the  corresponding  lesions 
on  temporal  pairs  of  mammograms.  Further  studies  are  underway  to  improve  the 
technique,  expand  it  to  different  types  of  SM,  and  evaluate  its  accuracy  on  a  larger 
data  set. 
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Computerized  Regional  Registration  of  Corresponding  Microcalcification 
Clusters  on  Temporal  Pairs  of  Mammograms  for  Interval  Change  Analysis 

i.M.  Hadjiiski,  PhD,  Ann  Arbor,  Ml  ^hadjisk@umich.edu)  •  H.  Chan,  PhD  •  N.A. 
petrick,  PhD  •  B.  Sahiner,  PhD  •  M.N.  Gurcan,  PhD  •  M.A.  Helvie,  MD  •  ef  a/ 
PURPOSE:  To  develop  a  regional  registration  technique  for  identifying 
corresponding  microcalcification  clusters  on  current  and  prior  mammo¬ 
grams  of  the  same  view.  The  technique  will  be  useful  for  computerized 
analysis  of  interval  changes  of  microcalcification  clusters  in  computer 
aided  diagnosis  (CAD). 

METHOD  AND  MATERIALS:  A  multi-stage  re^onal  registration  tech¬ 
nique  is  being  developed.  In  the  first  stage,  an  initial  fan-shape  search 
region  was  estimated  on  the  prior  mammogram  based  on  the  cluster 
location  on  the  current  mammogram.  In  the  second  stage,  detection  of 
cluster  candidates  within  the  search  region  was  performed  with  an 
automated  cluster  search  program.  The  cluster  (TP)  on  the  current  image 
was  paired  with  every  detected  cluster  (TP  or  FP)  in  the  search  region.  In 
the  final  stage,  a  correspondence  classifier  was  designed  to  reduce  the  false 
pairs  (TP-FP)  within  the  search  region.  Texture  and  morphological  features 
were  extracted  from  the  clusters  on  the  current  and  the  prior  mammo¬ 
grams.  Similarity  measures  were  derived  from  the  extracted  features  of  the 
TP  or  FP  clusters  for  each  temporal  pair.  Stepwise  feature  selection  with 
simplex  optimization  was  used  to  select  the  optimal  feature  subset.  A 
linear  discriminant  classifier  was  used  to  merge  the  selected  features  for 
classification  of  the  TP-TP  and  TP-FP  cluster  pairs.  In  this  preliminary 
study,  a  data  set  of  51  temporal  pairs  of  mammograms  from  19  patients 
containing  biopsy-proven  microcalcification  clusters  was  used.  The  true 
cluster  locations  were  identified  by  an  MQSA  radiologist.  A  leave-one-case- 
out  training  and  testing  resampling  scheme  was  used  for  feature  selection 
and  classification. 

RESULTS:  Using  a  search  region  with  an  average  area  of  1350  mm^ 
allowed  all  clusters  of  interest  to  be  localized  in  the  search  region.  The 
average  distance  between  the  estimated  and  the  true  centroid  of  the 
microcalcification  clusters  on  the  prior  mammogram  was  7.9  ±4.1  mm  after 
the  first  stage.  The  cluster  search  program  detected  90%  (46/51)  of  file  true 
clusters  with  an  average  of  0.69  FP  cluster  within  the  search  region  on  the 
prior  mammograms.  The  correspondence  classifier  reduced  the  FP  rate  to 
an  average  of  0.41  FP  cluster  at  the  cost  of  misclassifying  1  true  pair. 
CONCLUSIONS:  Our  preliminary  study  demonstrated  that  the  regional 
registration  technique  is  a  promising  approach  for  identifying  correspond¬ 
ing  microcalcification  clusters  on  temporal  pairs  of  mammograms.  Further 
studies  are  underway  to  improve  the  technique  and  to  evaluate  its 
accuracy  on  a  larger  data  set. 
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Computer-Aided  Characterization  of  Malignant  and  Benign 
Microcalcification  Clusters  Based  on  the  Analysis  of  Temporal 
Change  of  Mammographic  Features  ^ 

Lubomir  Hadjiiski,  Heang-Ping  Chan.  Metin  Gurcan,  Berkman  Sahiner 

RadioloL  r’l  '  ^  Roubidoux  (Department  of 

diology,  The  University  of  Michigan,  Ann  Arbor,  MI  48109-0904) 

i,^r,  demonstrated  that  interval  change  analysis  can 

prove  differentiation  of  malignant  and  benign  masses.  In  this  study  a 
new  classification  scheme  using  interval  change  information  was 

mammographic  microcalcification  clusters  as 

T  ?  iVf  ^  ^  ^  morphological  features  (MF)  were  extracted 

Twenty  difference  RLSF  were  obtained  by  subtracting  a  prior  RLSF  from 

RLwTr;?  •!”«'  of 

,  the  difference  RLSF,  and  the  current  and  prior  MF.  A  leave-one 

case-out  resampling  was  used  to  train  and  test  the  classifier  usino  65 

tempora  image  pairs  (19  malignant.  46  benign)  containing  biopsy-prWn 

microcalcification  clusters.  Stepwise  feature  selection^  and  a  linear 

discriminant  classifier,  designed  with  the  training  subsets  alone,  were  used 

to  select  and  merge  the  most  useful  features.  An  average  of  10  features 

were  selected  from  the  training  subsets,  of  which  2  difference  RLSF  and  6 

MF  were  consistently  selected  from  most  of  the  training  subsets 

classifier  achieved  an  average  training  A,  of  0.97  and  a  tesrA,  of  0  85  ^r 

comparison,  a  classifier  based  on  the  current  single  im;"  features 

ac^hieved  an  average  training  A.  of  0.93  and  test  A,  0^0.79.  These  resuks 

mdicate  that  the  use  of  temporal  information  improved  the  accurrey  of 
microcalcification  characterization.  ^ 


